Workshop IV: Mathematical Analysis of Cultural Expressive Forms: Text Data

Part of the Long Program Culture Analytics

May 23 - 27, 2016

Overview

Comprehensive collections of texts stretching back in time to the beginning of writing have become increasingly available in machine actionable form — from corpora of cuneiform writing, to the vast collections of medieval texts from Europe and Asia, to the immense “sea of the unread” represented by the Hathi Trust and Google Books collections. Similarly, millions of “born digital” texts are flooding the virtual world on a daily basis, from tweets, to blog posts, to other cultural expressive forms. These developments represent an unprecedented opportunity to advance knowledge in the broad domain of the impact of writing on the dynamics of culture.

This workshop focuses on the leading approaches to (a) extracting entities, topics, or narrative patterns from large, unstructured collections of text and analyzing them to (b) derive meaning from textual data and (c) understand the dynamics of social interactions or historical change. These approaches include text mining tools, sentiment analysis, topic modeling, textual memes, cross-language information retrieval, trend analysis, information retrieval, recommendations, and predictions of whether something will go “viral”. Mathematical tools include Bayesian models, supervised and unsupervised machine learning, optimization, and statistical language modeling techniques.

This workshop will include a poster session; a request for posters will be sent to registered participants in advance of the workshop.

Organizing Committee

David Blei (Columbia University)

Cristian Danescu-Niculescu-Mizil (Cornell University)

Kristina Lerman (University of Southern California (USC))

David Mimno (Cornell University)

Vwani Roychowdhury (University of California, Los Angeles (UCLA))

Ted Underwood (University of Illinois at Urbana-Champaign)