Before Beginning the Beguine: Expressing Complex Data as Edge Lists for Network Analysis

Glen Worthey
Stanford University
University Libraries

One of the real pleasures of our UCLA IADTH Summer 2010 Institute on “Networks and Network Analysis for the Humanities” was being able to experiment very early with network graphs, in a variety of software packages, based on data sets that we had brought with us to the Institute. The principal enabling factor for this “hit the ground running” approach, at least for me and my particular data set, was the presence and patience of expert help in creating flat files (in the form of edge lists) from a relational database. (This data preparation was modestly characterized by our tutors as “80% of the work” of network analysis!)

But grateful as I was for this assistance, I was left with some doubts: Had I really understood in some meaningful way what had been lost (or gained) in the initial data preparation? Could I really be confident in conclusions drawn from from the resulting network analyses of my hastily-flattened data? If I were to replicate the initial data-flattening on my own, based on my understanding of the original database, would the result be consistent with what I had seen before? In other words, how do different approaches and decisions during this pre-network analysis data preparation affect the analysis itself?

My presentation at the Reunion Conference will rely on the same data I used in 2010 – that is, a descriptive database of about 8,000 pages of manuscript letters written to Athansius Kircher. For this round, though, I’ll go all the way back to the beginning, creating my own flat files from the original relational database, seeking to express its complexities in as many different ways as are supported by the logic of the data, and comparing the network-analyzed results.

Back to Networks and Network Analysis for the Humanities: Reunion Conference