The enormous digitized archives of books and newspapers produced during the past two decades presents historians with new opportunities--and new challenges as well. The possibility of analyzing increasingly large portions of the historical record is incredibly exciting, but it can't be done with conventional methods that involve close reading or even not-so-close skimming. These huge new text archives challenge us to apply new methods.
This presentation will explore one such method--a text-mining method called topic modeling--and illustrate its potential through a comparative analysis of two Civil War newspapers: the New York Times and the Richmond Daily Dispatch. These are both relatively large corpora, each consisting of more than 100,000 articles and advertisements. Topic modeling enables us to identify major topics in such larger corpora and quantify and chart their relative frequency over time, allowing us to analyze some broad historical patterns.
Many of these patterns are surprising, prompting new questions and suggesting new insights. To illustrate the potential of topic modeling, this presentation will present some initial conclusions from this research. I will analyze the topic models to explore the relationship between florid, patriotic paeans about God, honor, and country and splenetic editorials in each paper condemning the immorality of the other section's society. Graphs of these two topics are remarkably similar; poesy and vitriol were always two sides of the same coin. Together these graphs provide a cardiogram registering the deployment of patriotism and sectionalism in two major newspapers during the course of the war.
Back to Networks and Network Analysis for the Humanities: Reunion Conference