Sometimes fractal and sometimes quantum: Universal geometric representations of beliefs about the far future in deep neural networks pretrained on next-token prediction

Paul Riechers
Beyond Institute for Theoretical Science

What do modern AI models learn when we train them as usual on next-token prediction? A model of the world? Indeed, we confirm that performing well at next-token prediction implies the ability to predict the entire future as well as possible. This imparts not only an implicit world model, but also a way to effectively perform Bayesian inference over that model of the world---updating beliefs about the hidden state of the world in context. Moreover, we anticipate and find universal (sometimes fractal) geometric representations of these beliefs linearly embedded in the activations of deep neural networks pretrained as usual to minimize next-token prediction loss. The same representations are found in both recurrent neural networks (RNNs) and transformers. Finally, in a twist fit for sci-fi, we show that neural networks intrinsically discover and represent beliefs over 'quantum' and 'post-quantum' low-dimensional models of the classical stochastic processes they are trained on.


View on Youtube

Back to Long Programs