The Semantics of Literary History: Topic Modeling the German Novel 1731-1864

Matt Erlin
Washington University in St. Louis
Germanic Lang. & Lit.

My project employs the techniques of probabilistic topic modeling to test a set of longstanding assumptions about the periodization of German literary history. Scholars have applied a fairly consistent set of period designations to categorize German literature written during the span of roughly one hundred years between 1750 and 1850: “Enlightenment and Sensibility,” “Storm and Stress,” “Weimar Classicism,” “Romanticism,” “Biedermeier,” “Young Germany,” and “Realism.” Applying the MALLET topic modeling toolkit to a data set of 154 novels written between 1731 and 1864, I have been evaluating whether these novels do in fact cluster together in ways that support the scholarly consensus, or whether there might be hidden thematic structures in these works that point to new ways of thinking about their “proximity” to one another. This analysis has the potential to shed light on a range of research questions related to the literary history, especially with regard to understanding those features of texts that might cause us to classify them together. Is similarity, for example, best grasped in terms of similar themes (as a topic modeling approach would suggest), or should distinctions of style and structure play an equally significant role? To what extent do variables such as genre (the novel) or gender (male versus female authorship) generate patterns of similarity that challenge traditional ways of thinking about classification? A final component of the project involves an attempt to find compelling ways to represent visually the idea of a proximity among texts as measured across multiple variables. Network diagrams would seem to offer a particularly promising model for such visualizations.

Back to Networks and Network Analysis for the Humanities: Reunion Conference