Data Mining beyond Text in the Vogue Archive

Lindsay King
Yale University

Data mining projects in libraries have often focused on textual data, as with corpus linguistics or literature studies. However, visual analyses of datasets comprised of digital images with rich metadata, whether digitized by libraries or by vendors as licensed products, represent rich new avenues of research for scholars in the arts and humanities. Tools and techniques borrowed from other fields allow us to do “distant viewing” as well as sorting and quantifying of large collections of images, providing perspectives we would not otherwise have. As with text mining, visual analysis requires background knowledge of the corpus of data to make sense of the patterns or anomalies we might observe. This presentation uses the ProQuest Vogue Archive as a demonstration of the types of research experiments enabled by access to the full data files under a perpetual access license. Experiments with text-mining approaches like topic modeling and n-gram searching have pointed to areas for further research. Image analyses that allow scholars to observe design patterns and shifts over time or identify and sort color choices promise to add new dimensions to analysis of this material. Zooming out to the ever-larger realm of digitized collections, these multifaceted approaches are examples of how libraries can create entry points for vast archives of data and heighten their interdisciplinary appeal.

Back to Long Programs