Networks and Large-Scale Text Analysis
University of Washington
This presentation explores the intersection between textual analysis and network analysis. How should we visualize relationships and interconnections between longer works of written culture? My data is from an experimental survey of hundreds of thousands of 19th- and 20th-century books written in Northern European languages. Unlike simpler data sets, where nodes may have a relatively restricted set of attributes, novels and other literary works are classifiable on multiple and overlapping dimensions. Productive techniques for managing this complexity include generative approaches such as Topic Modeling. Yet challenges remain for graphing the resulting relationships in a clear and convincing manner.
Recent research has considered methods of integrating topic modeling with visualizations of relational data in network graphs (McCallum 2005, Deisner 2010). I present some of the implications of these approaches, specifically strategies of presentation and user interaction to expose multi-dimensionality while retaining visual clarity. As we move away from small, curated collections and towards large-scale online corpora, my presentation considers the continuing utility of network analysis in the context of computational analysis of millions of volumes.