The HathiTrust Research Center (HTRC) hosted its first annual UnCamp September 10th and 11th at Indiana University, Bloomington. Session leaders included: Colin Allen (Indiana University), Loretta Auvil (University of Illinois), J. Stephen Downie (University of Illinois), Stacy Kowalczyk (Indiana University), Robert McDonald (Indiana University), Beth Plale (Indiana University), Yiming Sun (Indiana University), Ted Underwood (University of Illinois), and Jeremy York (HathiTrust). The keynote, HathiTrust: Putting Research in Context was presented by John Wilkin, Executive Director, HaithiTrust.
Some of the sessions highlighted analysis workflows developed by the Software Environment for the Advancement of Scholarly Research (SEASR) project and provided information on digital humanities applications of SEASR. Loretta Auvil demonstrated SEASR during her presentations “Demonstrations of Capability” and her session on “SEASR Analytics”. She explained that the project focuses on developing, integrating, deploying, and sustaining a set of reusable and extendable software components and a supporting framework to benefit a broad set of data mining applications for scholars in the humanities. Loretta showed SEASR activities using Meander workflows and guided participants through a hands-on session.
Ted Underwood, an associate professor of English at the University of Illinois at Urbana-Champaign, gave a session on “Using HathiTrust Texts for Literary Research” which provided a specific use case of the SEASR project. He is converting 500,000 eighteenth and nineteenth-century volumes downloaded from HathiTrust into a normalized collection that can be used for literary-historical research. He, along with several other contributors, is involved in cleaning data in deeper ways than simply looking at the typical errors that optical character recognition (OCR) is able to detect. Dr. Underwood and his team are also looking at problems and opportunities in terms of enriching data and cleaning metadata. Examples of metadata cleaning include: solving problems with dating a document, discarding duplicate volumes, and adding metadata that would be useful for the interpretive processes such as information on gender and genre.
HTRC UnCamp will be a yearly event highlighting demonstrations and hands-on workshops for anyone who is interested in mining and analyzing large amounts of quantitative information.
Further information about HTRC including publications and .pdfs of HTRC UnCamp presentations can be found here: http://wiki.htrc.illinois.edu/display/OUT/HTRC+UnCamp2012