Digital Humanities Project Uncovers Hidden Patterns in Literature

From Sermons to Sausage: The Literary Data of 1771

By Niamh Clarke

David Mazella, Ph.D., is an Associate Professor of English at the University of Houston, focused on eighteenth-century British literature. His digital humanities project, The Year 1771, provides a new approach to reading texts, uncovering patterns that traditional methods often overlook.

"My particular approach is to think about what I call generic relations, and to use not just word frequencies, but word patterns that I'm hoping will flag a particular genre in the mind of someone," Mazella said. That includes verb types, modal structures, and other linguistic cues.

Mazella's initial interest in genre began with confusion. While reading sermons for his dissertation, he initially thought, "They're all about this guy, Jesus." However, upon further analysis, he discovered subtle differences in style and belief. This led to his interest in how genres are formed, perceived, and misinterpreted. He sees genre as spatial, with texts positioned at varying distances on a conceptual map.

However, the formation of a project took a bit more thinking. Mazella recalls his graduate director once said he was tired of traditional dissertations and proposed, "'it'd be kind of cool to know a single year.'"

"Then what I realized was that the single-year project would probably be a good way to talk about simultaneous developments in literary history across a geographical range, Mazella said.

In 2017, he worked with a group of students and Claude Willan, then at the University of Houston's Digital Research Commons (DRC), to catalog every 1771 work and author listed in the English Short Title Catalog from Eighteenth Century Collections Online (ECCO). Their research led to surprising discoveries and their method for tagging genres was later published in Aphra Behn Online (ABO) with over 400 downloads to this day.

The project now includes two forthcoming components: a website aimed at undergraduate readers and a book titled 1771: A Literary History.

The Year 1771 brings together campus resources, supported by UH's REACH program, the 2024 SIPHDH program, and upcoming collaborations with UH-Sugarland UI/UX undergrads and the Rowan Writing Internship. However, the primary support comes from the Digital Humanities Core Facility (DHCF) and the DRC staff, particularly for the essential technical component: Optical Character Recognition (OCR).

"OCR is how the sausage gets made," Mazella said. OCR is the process that converts scanned pages of 18th-century print into searchable text, which can then be measured, analyzed, and visualized. But it's not magic. Errors from old typefaces, damaged originals, and digital distortions make the process messy.

"It's always a recursive project," Mazella explains. "You just loop back and forth… You learn what the characteristic mistakes are going to be."

Now, the project's methods are being taught to students outside the research team. This summer, UH's DHCF is launching a new micro-credential course, Textual Recovery: OCR Processing Techniques. Designed for students without experience, the course trains them to convert documents into usable text, while learning what gets lost (or added) in translation.

Mazella is incorporating similar methods into his Introduction to Literary Studies class, where students will apply OCR techniques to works by Phyllis Wheatley, Benjamin Franklin, and Olaudah Equiano.

"Research is risk-taking," Mazella said. "The success I've had has largely been because the students I've recruited visibly, dramatically improved from doing it."

"You're not going to make mistakes. You're just going to get tired," he continued. "What I will say is that the effort does pay off. It ends up giving you an insight into more than one thing—it gives you insight into lots of different issues, historical and present. It will, I think, change your attitude about the media."