Modeling Hemingway, Modeling Digital Humanities: A Pedagogy-Driven Investigation of an In-Copyright Corpus

Brian Croxall
Brown University

Of the many challenges facing scholars looking to apply digital humanities methods to literature none is perhaps more vexing than the problem of copyright. The decision by the European Union and the United States to extend the term of copyright to the "life of the author plus 70 years" means that the majority of works published in the twentieth century remain in copyright and the works published in the twenty-first century will almost certainly be untouchable for mass digitization and dissemination for another 100 years. While copyright is important for authors to profit from their intellectual and creative labors, it simultaneously puts limits on the intellectual and creative work that can be done by digital humanists. The result is that much of the prominent work in computational literary analysis has focused on the moment that precedes the current copyright regime: the nineteenth century and earlier. Distant reading has largely depended on a large corpus from which to draw conclusions, and a large corpus has ultimately depended on someone else, somewhere doing the difficult labor of digitization. Under present copyright law, no one will undertake this mass digitization for contemporary literature.

But what if mass digitization is the wrong approach for works from the twentieth century? What other models could there be for doing digital humanities in the era of persistent copyright? In this paper, I will discuss an innovative, multi-year assignment that I have carried out in my “Introduction to Digital Humanities” courses: the digitization and analysis of the complete works of Ernest Hemingway.


Back to Workshop IV: Mathematical Analysis of Cultural Expressive Forms: Text Data