Posts Tagged ‘monk’

Loretta Auvil was invited to present the keynote address at the Text Mining Workshop 2008, which was held in conjunction with the Eighth SIAM International Conference on Data Mining (SDM 2008) in Atlanta, GA on April 26, 2008.  Her presentation title echoes SEASR’s identifying phrase, “Engineering Knowledge for the Humanities.”

Presentation


Abstract

Over the last decade NCSA’s Automated Learning Group has innovated data mining technologies for industry, government, and the sciences. In the past few years, we have broadened our focus to include knowledge discovery in the humanities. My presentation will focus on how we are negotiating humanities computing’s special challenges for data mining and analysis. I will discuss our early collaborative projects, FeatureLens and Nora, and SEASR (Software Environment for the Advancement of Scholarly Research), the Andrew W. Mellon Foundation-funded project we are now leading. Each of these projects has developed technologies customized to meet specific needs of the digital humanities community. FeatureLens–an early MONK (Metadata Offer New Knowledge) application–uses the machine learning approach of frequent pattern mining to identify fuzzy repetition patterns in a data collection, and with no initial human input. Nora–a case study for eighteenth- and nineteenth-century British and American literature–uses predictive modeling techniques to classify documents, even given complex and notoriously indistinct expert classes such as sentimental fiction. SEASR is our most ambitious project yet, employing a semantic-based, service-oriented architecture to build software bridges that allow users to access data stored in disparate formats and on incompatible platforms and to provide an enhanced environment for workflow and data sharing. The essential infrastructure SEASR provides will advance the capabilities of projects like our partner, MONK, a digital environment designed to help humanities scholars discover and analyze patterns.

Loretta Auvil and Amit Kumar participated in MONK’s latest Hackfest (February 7-10, 2008, Chicago).

In preparation for the meeting, Peter Groves produced an icon to suggest how well a particular file or feature contributes to supervised classification, a feature MONK anticipates adding to the feature display in the Search by Example toolset. At the meeting, Amit Kumar (who is tasked with developing the MONK workbench) and other MONKies connected new proxy calls through the workbench, which will include SEASR calls. Loretta Auvil started toward an unsupervised classification of the TEI-A verion of witchcraft files through SEASR, to advance research for Dr. Kirsten Uszkalo’s use case.

At meeting’s end, the MONK team requested that SEASR develop a clustering tool written in Google Web Toolkit, to be tested on the Nineteenth-Century Fiction and Witchcraft databases.

On December 14th and 15th, SEASR team member Loretta Auvil (project co-PI) attended the MONK (Metadata Offers New Knowledge) All-Hands Meeting on the University of Maryland-College Park campus. Project cells presented reports of the past year’s accomplishments and challenges, and we shared our progress in building SEASR technologies and how they will support and enhance MONK. Among its other capabilities, SEASR has developed a workbench/dataflow environment written in Google Web Toolkit. We envision that MONK will “sit” on top of this environment as a user interface for working with data. On top of the MONK interface will be a portal for sharing results.

In ongoing discussions, the SEASR and MONK teams determined that quality of results, rather than speed, would take priority. The teams also theorized about how SEASR’s dataflow environment will operate with MONK’s datastore, existing workbench, and portal for sharing and allow for transparency (the publication and sharing of process dynamics that reveal methodological decisions to other developers, including compotent medtadata and flow parameterization).