The connections between social actors and cultural artifacts leave traces in texts. First, I will discuss the use of language modeling to detect textual sharing and virality in nineteenth-century mass media, in the twenty-first century legislative process, and in other fields. These inferred networks are useful discovery tools for scholars, helping to bring structure to the vast, but opportunistically structured, products of mass digitization. These networks also help us build better models for other natural language tasks, including optical character recognition, named entity recognition, and machine translation.
Back to Workshop IV: Mathematical Analysis of Cultural Expressive Forms: Text Data