Library systems, and especially modern digital archives, offer a vast wealth of cultural data in formats that are generally conducive to computational analysis. Attempting to access and successfully use such information for research, however, brings up a range of challenges and considerations. Foremost among these are issues of machine access and data integrity, which themselves tend to have cultural, rather than computational origins: ownership and copyright restrictions, collector and contributor biases, and assumptions about which aspects of a collection are “important." Nevertheless, library technologies and the cultures around them have in recent years steadily expanded the range of digital materials available for study and the tools for exploring these collections.
This tutorial will present case studies of the present state of the art in conducting cultural analyses via materials in library systems, with an emphasis on the opportunities rather than disappointments that result from maintaining a critical perspective towards such resources. Techniques covered will include approaches to bulk data access with an emphasis on linked open data technologies, methods for extracting actionable data from in-copyright or otherwise limited-access collections, and use of multimedia sources. Among the datasets presented for analysis and experimentation will be crowdsourced, semantically linked public knowledge bases, as well as materials held at UCLA: a massive multimodal archive of in-copyright television news, sets of anonymized social media records, and extracted metadata from cultural history collections.