Human culture generates unfathomable amounts of high dimensional data. While classical interpretive methods can find pattern and structure in such traces, these methods are limited to very small subsets of the data. Using ideas about word embeddings from the theory of machine learning, my collaborators and I have recently developed data-efficient methods for analyzing much larger corpora. Our methods combine word embedding techniques with sparse dictionary learning to transform texts into sequences of interpretable topics. After introducing our discourse atom topic modeling (DATM) approach, I use it to analyze official reports of violent death. Combining DATM with network analysis allows us to discover the detailed topical structure of narratives. I use this combined method to study a large corpus of dream reports, showing that the affective content of dreams is coupled to their semantic content and topical structure. These methods point the way toward a new science of phenomenal experience, in which hundreds of thousands of subjective reports can be studied systematically and at scale — without privacy concerns, data contamination, computational cost, or other challenges introduced by LLMs.
Back to Long Programs