Archive for the ‘News’ Category

Dr. Ted Underwood, Associate Professor of English at the University of Illinois, Urbana Champaign,  will be publishing a book through the Stanford University Press that features research conducted for and with the SEASR project.

The book is entitled, “Why Literary Periods Mattered: Historical Contrast and the Prestige of English Studies” and is expected to be released in 2013.

The HathiTrust Research Center (HTRC) hosted its first annual UnCamp September 10th and 11th at Indiana University, Bloomington. Session leaders included: Colin Allen (Indiana University), Loretta Auvil (University of Illinois), J. Stephen Downie (University of Illinois), Stacy Kowalczyk (Indiana University), Robert McDonald (Indiana University), Beth Plale (Indiana University), Yiming Sun (Indiana University), Ted Underwood (University of Illinois), and Jeremy York (HathiTrust). The keynote, HaithiTrust: Putting Research in Context was presented by John Wilkin, Executive Director, HaithiTrust.

Some of the sessions highlighted analysis workflows developed by the Software Environment for the Advancement of Scholarly Research (SEASR) project and provided information on digital humanities applications of SEASR. Loretta Auvil demonstrated SEASR during her presentations “Demonstrations of Capability” and her session on “SEASR Analytics”. She explained that the project focuses on developing, integrating, deploying, and sustaining a set of reusable and extendable software components and a supporting framework to benefit a broad set of data mining applications for scholars in the humanities. Loretta showed SEASR activities using Meandre workflows and guided participants through a hands on session.

Ted Underwood, an associate professor of English at the University of Illinois at Urbana-Champaign, gave a session on “Using HathiTrust Texts for Literary Research” which provided a specific use case of the SEASR project. He is converting 500,000 eighteenth and nineteenth century volumes downloaded from HathiTrust into a normalized collection that can be used for literary-historical research. He, along with several other contributors, is involved in cleaning data in deeper ways than simply looking at the typical errors that optical character recognition (OCR) is able to detect. Dr. Underwood and his team are also looking at problems and opportunities in terms of enriching data and cleaning metadata. Examples of metadata cleaning include: solving problems with dating a document, discarding duplicate volumes, and adding metadata that would be useful for interpretive process such as information on gender and genre.

HTRC UnCamp will be a yearly event highlighting demonstrations and hands on workshops for anyone who is interested in mining and analyzing large amounts of quantitative information.

Further information about HTRC including publications and .pdfs of HTRC UnCamp presentations can be found here: http://wiki.htrc.illinois.edu/display/OUT/HTRC+UnCamp2012

Loretta Auvil and other collaborators from the SEASR Services project are attending the Topic Modeling for Humanities Research Workshop funded by the National Endowment for the Humanities. The workshop will be on Saturday, November 3, 2012 at the Maryland Institute for Technologies in the Humanities (MITH) at the University of Maryland, College Park, Maryland.

The SEASR Team will participate in University of Victoria’s Digital Humanities Summer Institute on June 4-8, 2012. You can find information at http://www.dhsi.org/. The course entitled “SEASR Analytics” will be taught by Loretta Auvil and Boris Capitanu.

The course will provide an introduction to the SEASR analytics with hands-on training with the tools. We will cover an Introduction to text mining tools, and Using and creating Zotero flows, Topic Modeling and Concept Mapping.

Loretta Auvil attends Chicago Colloquium on Digital Humanities and Computer Science 2011 which occurred Nov. 20-21, 2011. With many of the collaborators from the SEASR Services project, a set of demonstrations were prepared as part of the Software Demonstration program. Our work with Matt Jockers from Stanford prompted the Topic Modeling demonstration. Our work with Ted Underwood on the Google Ngrams data and correlation analysis prompted the development of the web application, “Correlation Analysis and Ngram Viewer“. A paper describing the demonstration, “SEASR Analytics” is available here.

Ted Underwood, one of our collaborators on the SEASR Services project, also attended and presented,
“Combining topic-modeling and time-series approaches to reveal trends in 18th- and 19th-century discourse.” This paper is also available here.

The University of Victoria’s Digital Humanities Summer Institute (DHSI) was held on June 6-10, 2011. Loretta Auvil and Boris Capitanu taught the course entitled “SEASR in Action: Data Analytics for Humanities Scholar”. The slides and course materials for this workshop are at http://dev-tools.seasr.org/confluence/display/Outreach/DHSI-SEASR-2011.

The course covered the following topics: Overview of SEASR infrastructure (components, flows, applications), Introduction to text mining tools, and Using and creating Zotero flows, Topic Modeling and Concept Mapping.

Loretta Auvil was invited to participate in the 2011 Computer Assisted Reporting (CAR) Conference held in Raleigh, NC on February 24-27, 2011. CAR is a journalism conference that focuses on digital reporting with the use of tools and technology for data driven analysis. The panel on Visualizing Text was shared with Brant Houston of University of Illinois and John Stasko of Georgia Tech. Loretta briefly introduced the SEASR project and Meandre with several demonstrations of applications of visualizing text. The slides can be found at the link below.

http://dev-tools.seasr.org/confluence/download/attachments/6979872/CAR2011public.pptx

The Andrew W. Mellon Foundation has awarded a two-year grant to Stanford University with collaborators: Mike Keller, Matthew Jockers and Franco Moretti from Stanford University; John Unsworth, Michael Welge and Ted Underwood from the University of Illinois; Dan Cohen from George Mason University; and Tanya Clement from the University of Maryland.

This team of researchers will explore text-mining as a tool for understanding the humanities and will focus on using SEASR/Meandre to solve the particular use cases.

Other web postings about the grant:

http://www.stanford.edu/~mjockers/cgi-bin/drupal/node/51

http://www.lis.illinois.edu/articles/2010/10/mellon-grant-expand-text-mining-research-unsworth-and-team

http://cirss.lis.illinois.edu/soda/seasr.html

I attended ThatCamp London in June that incorporated a Developers Challenge. So I (with a little help from Boris Capitanu) decided to participate and leverage existing SEASR/Meandre flows that we have created in order to demonstrate how they could be used to create a mashup. I chose to use the Victoria and Albert Museum Collections data because they created an API to access their data. I created two new components. One to process and query the API iteratively until all results are retrieved. The second one is more generic for selecting specific json fields from the data.

I modified 4 existing web service enabled flows to use these components to retrieve the data and create the visualizations. Once I had the 4 flows functioning separately, I created an html page that passes the search query to each flow and create the visualizations for the mashup. The first view is a tagcloud of the description of the objects that satisfy the query. The second view is a ngram tagcloud of the historical significance attribute of the objects. The third applies entity extraction to the data, extracting the location and plotting these locations on a map, where the sentences containing the locations can be read. The fourth one also applies entity extraction and extracts people, organizations and locations and creates links between entities that coexist within 2 sentences. This was just a quick prototype to showcase the capabilities of the Meandre environment for a mashup. So ultimately these flows could be optimized for performance.

I put together a screencast in a short time frame to satisfy the Challenge deadline. So the video is not polished because there was no time for editing or tweaking the search term. Thus, here is the video in its raw form. The good news was that I received an Honorable Mention for the submission, “… with a neat SEASR flow that used the API from the Victoria and Albert Museum and visualized searches in multiple ways.” The winner announcement for the Developer Challenge is here.

We created a movie that highlights some of the projects and groups using the SEASR technology. Check out http://repository.seasr.org/Movies/SEASR-Nov-2009.m4v for more details. We will plan to update this movie periodically. If you are using SEASR, please let us know, so that we can incorporate your work.