Columbia professor Vishal Misra presents experimental evidence that transformers perform Bayesian updating, meaning they revise probability distributions in a mathematically precise, predictable way as they process each new token. This is not a metaphor. His wind tunnel tests show the mechanism holds under controlled conditions, which reframes in-context learning from a mysterious emergent behavior into a measurable statistical process.

The harder argument is what this finding rules out. Bayesian updating over patterns is not causal reasoning. Misra draws a direct line between this limitation and the AGI problem: scaling a pattern-matcher does not produce a system that models cause and effect, and without causal modeling, you cannot get continuous post-training learning. More compute does not close that gap. The title is the thesis.

The section on manifolds and new representations at 36 minutes is where the paper gets technically dense and worth slowing down for. Misra is describing what internal geometry would have to change for a model to move from interpolation to genuine simulation, framed around the minimum program length needed to compress a causal world model. That argument alone justifies reading the full transcript.

[WATCH ON YOUTUBE →]