Information theory, minimum description length, and human cognition

Nick Chater
University College London

Since at least Shannon, there has been a clear relationship between coding and probability. Shannon's information theory indicates how an optimal code for a probabilistic source should assign code lengths which are inversely related to probability (i.e., probable items have especially short codes). There have followed more than fifty years of attempts to apply information theory to cognition; I shall review some of this work briefly. One disadvantage of this approach, though, is that, both when considering the structure of the 'natural' environment, and even, often, in experimental conditions, it can be very difficult to provide a probabilistic model of the source (e.g., where that source might moving images; or what, from a participant's perspective, seems a puzzling series of experimental stimuli). Yet in some contexts, at least, cognitive science has hypotheses about the representations (i.e., the codes) in which information is captured (this is particularly true for language). In such cases, it is possible to invert the mapping from probability to coding, and to use hypotheses about codes as a way of building a (subjective) probabilistic model of the input. I briefly introduce and apply some technical ideas from a particularly type of information theory, Kolmogorov complexity theory, which allows some cognitively interesting results to be derived concerning induction, and, in particular, language acquisition.

Further reading:

Chater, N. & Vitányi, P. (in press). ‘Ideal learning’ of natural language: Positive results about learning from positive evidence. Journal of Mathematical Psychology.

Chater, N., & Vitányi, P. (2002). Simplicity: A unifying principle in cognitive science? Trends in Cognitive Sciences, 7, 19-22.

Vitányi, P. & Li, M. (2003). An introduction to Kolmogorov complexity theory and its applications. Berlin: Springer.

Presentation (PowerPoint File)

Back to Long Programs