Posts Tagged ‘nora’

Loretta Auvil was invited to present the keynote address at the Text Mining Workshop 2008, which was held in conjunction with the Eighth SIAM International Conference on Data Mining (SDM 2008) in Atlanta, GA on April 26, 2008.  Her presentation title echoes SEASR’s identifying phrase, “Engineering Knowledge for the Humanities.”



Over the last decade NCSA’s Automated Learning Group has innovated data mining technologies for industry, government, and the sciences. In the past few years, we have broadened our focus to include knowledge discovery in the humanities. My presentation will focus on how we are negotiating humanities computing’s special challenges for data mining and analysis. I will discuss our early collaborative projects, FeatureLens and Nora, and SEASR (Software Environment for the Advancement of Scholarly Research), the Andrew W. Mellon Foundation-funded project we are now leading. Each of these projects has developed technologies customized to meet specific needs of the digital humanities community. FeatureLens–an early MONK (Metadata Offer New Knowledge) application–uses the machine learning approach of frequent pattern mining to identify fuzzy repetition patterns in a data collection, and with no initial human input. Nora–a case study for eighteenth- and nineteenth-century British and American literature–uses predictive modeling techniques to classify documents, even given complex and notoriously indistinct expert classes such as sentimental fiction. SEASR is our most ambitious project yet, employing a semantic-based, service-oriented architecture to build software bridges that allow users to access data stored in disparate formats and on incompatible platforms and to provide an enhanced environment for workflow and data sharing. The essential infrastructure SEASR provides will advance the capabilities of projects like our partner, MONK, a digital environment designed to help humanities scholars discover and analyze patterns.