Posts Tagged ‘community’

Bernie Ács of the SEASR Team made a remote presentation for the NINES/18th Connect Workshop held in Dublin on July 15, 2009. The presentation included an overview of the SEASR project. Also in remote attendance were Loretta Auvil and Xavier Llorà of the SEASR Team.

The presentation is available here.

Tags:
Jun 29

THATCamp

by . No comment - Post a comment

Loretta Auvil, Boris Capitanu, and Amit Kumar of the SEASR Team participate in THATCamp 2009 at George Mason University on June 27-29, 2009. We had the opportunity to have a session on SEASR Analytics. We also had the opportunity to discuss SEASR with many humanities researchers.

Tags:

Loretta Auvil and Bernie Ács of the SEASR Team participated in the Digital Humanities 2009 Conference at the University of Maryland on June 22-25, 2009. We had a poster called “SEASR Integrates with Zotero to Provide Analytical Environment for Mashing up Other Analytical Tools” by Boris Capitanu, Xavier Llorà, Loretta Auvil, Michael Welge, and Bernie Ács. We also had the opportunity to have discussions with many humanities researchers.

The SEASR Team held a Follow-up SEASR Workshop on the Monday, June 22, 2009 of Digital Humanities 2009 week. Loretta Auvil and Bernie Ács presented updates to the SEASR project. We had presentations from Andrew Ashton of Brown University, Clare Lewellyn and Michael Krot of JSTOR, Anoop Kumar of Tufts (VUE), and Susan Schreibman of Digital Humanities Observatory.

The presentation materials are available at http://dev-tools.seasr.org/confluence/display/Outreach/June2009Follow-up.

The University of Victoria’s Digital Humanities Summer Institute (DHSI) was held on June 8-12, 2009. Loretta Auvil and Boris Capitanu taught the course entitled “SEASR in Action: Data Analytics for Humanities Scholar”. The slides and course materials for this workshop are at http://dev-tools.seasr.org/confluence/display/Outreach/DHSI-SEASR.

We had 15 students registered for the course. The course covered the following topics: Overview of SEASR infrastructure (components, flows, applications), Introduction to text mining tools, and Using and creating Zotero flows.

The Visual Understanding Environment (VUE) Team at Tufts University held a webinar on May 5, 2009 to highlight the project’s new features. This presentation included a prototype integration with SEASR. Check out the webinar here.

Tags: ,

Loretta Auvil was invited to present the keynote address at the Text Mining Workshop 2008, which was held in conjunction with the Eighth SIAM International Conference on Data Mining (SDM 2008) in Atlanta, GA on April 26, 2008.  Her presentation title echoes SEASR’s identifying phrase, “Engineering Knowledge for the Humanities.”

Presentation


Abstract

Over the last decade NCSA’s Automated Learning Group has innovated data mining technologies for industry, government, and the sciences. In the past few years, we have broadened our focus to include knowledge discovery in the humanities. My presentation will focus on how we are negotiating humanities computing’s special challenges for data mining and analysis. I will discuss our early collaborative projects, FeatureLens and Nora, and SEASR (Software Environment for the Advancement of Scholarly Research), the Andrew W. Mellon Foundation-funded project we are now leading. Each of these projects has developed technologies customized to meet specific needs of the digital humanities community. FeatureLens–an early MONK (Metadata Offer New Knowledge) application–uses the machine learning approach of frequent pattern mining to identify fuzzy repetition patterns in a data collection, and with no initial human input. Nora–a case study for eighteenth- and nineteenth-century British and American literature–uses predictive modeling techniques to classify documents, even given complex and notoriously indistinct expert classes such as sentimental fiction. SEASR is our most ambitious project yet, employing a semantic-based, service-oriented architecture to build software bridges that allow users to access data stored in disparate formats and on incompatible platforms and to provide an enhanced environment for workflow and data sharing. The essential infrastructure SEASR provides will advance the capabilities of projects like our partner, MONK, a digital environment designed to help humanities scholars discover and analyze patterns.

The SEASR and NEMA (Networked Environment for Music Analysis) teams have transformed a dynamic music classification explorer developed by IMIRSEL (The International Music Information Retrieval Systems Evaluation Laboratory) into a SEASR application that can be reused in whole or part by music researchers everywhere. Ira Fuchs–Vice President of Research in Information Technology for The Andrew W. Mellon Foundation (sponsor of SEASR and NEMA)–gave the “Son of Blinkie” (SoB) explorer its first demonstration on April 16th.

INTRODUCING SON OF BLINKIE

Innovations in digital technologies have changed the ways we create, access, analyze, share, and consume information. But to realize their full potential, we need to re-evaluate digital information technologies to consider whether their methods are hold-outs from the age of print and, if so, what improved means we can devise. IMIRSEL’s SoB [1, 2], a dynamic classification explorer for musical digital library users and researchers, offers such an advance to the way in which we access and analyze music.

In the print collections and their digital descendents, information is retrieved through metadata, or descriptive labels, imposed upon it by librarians, editors, and domain experts. This metadata is used to generate tables of contents, subject indexes, and other searchable formats. Once determined, such labels and their associated epistemologies tend to become fixed and accepted as fact; they present a closed system of established knowledge rather than provide a virtual landscape that encourages exploration and enables discovery.In developing Son of Blinkie—affectionately named after the earlier, simpler “Blinkie Thing” [3]—the researchers at IMIRSEL have sought to bring leading machine learning methods to bear on the problem of how to make better use of the now digital nature of music collections. They have developed a means for searching music automatically, using its features of composition rather than imposed metadata as a guide. Not only does this automated method improve the speed and accuracy of information retrieval, but it promises to enrich our understanding of music and its classification.

Faced with a collection of music, we often accept that the labels imposed by past listeners are accurate and/or informative. But listeners may hold conflicting opinions about a piece, and the piece itself may defy reductive labeling. Through analyzing a piece using its own compositional features, machine learning can help us to understand whether a given piece is representative of a genre or mood as a whole or to certain compositional tendencies within it, tendencies that may change over time, by performer, or even by performance. What’s more, Son of Blinkie (SoB) advances earlier attempts to automate digital music collection retrieval and analysis.

Consider the traditional train-test approach to building, evaluating, and using machine-generated audio-based classifications (e.g., genre, mood, artist, etc.) for Music Digital Libraries (MDL). It’s useful in some contexts, but has two serious shortcomings. First, the classifications are monic (i.e., only one class label per piece). This monicity ignores the fact that most music comprises a mix of moods and/or genres, etc. Second, the classifications are static (i.e., one class label per song) even though pieces evolve through several moods and/or genre mixes over their play time. The SoB system offers a new and superior method of digital music exploration, engineered to overcome train-test shortcomings and better capture the dynamic nature of music. SoB provides users with the capacity for highly configurable real-time classification, visualization, and audition.

Another important advancement made with SoB is that the application operates within SEASR’s service-oriented architecture, taking the form of a series of reusable, open-source components managed by and executed as a shareable workflow from SEASR’s community hub. Not only can users run SoB against their own data sets– with SEASR’s assistance in accepting different input formats stored on different platforms–but they can also reuse and revise components and workflows to build their own music research applications.

SON OF BLINKIE IN ACTION

SoB works by extracting a stream of features from audio tracks and applying a set of pre-trained classification models to short windows (10 sec.) of these features to generate posterior probability distributions in real-time. The display of the classification probabilities is synchronized with the audio playback, empowering users to dynamically explore the effects and interactions of an infinite number of parameters involved in automatic music classification. SoB permits users to select an arbitrary number of classification models from the system’s ever-growing model library. Currently SoB’s model library comprises two classification “task” collections: mood and genre classifiers.

sonofblinkieclassifiersm1.jpg

Above, we show a user simultaneously exploring the different real-time behaviors of mood classification models and genre classification models. Each model is making different predictions on this particular 5-second slice of the incoming, never-heard-before, song. The user can visualize the models’ prediction probability distributions, which can help the user better appreciate the potential “mixture” of moods present. The user can also listen to the synchronized audio to better understand the strengths/weaknesses of each model.

Below is a view that shows how data flows through the Son of Blinkie system, as it operates within SEASR (specifically, the semantic, web-driven dataflow execution environment portion of SEASR, which we have named Meandre). Each component represents one step in processing the data. The components run (and so process data) in the order established by the flow: from receiving the song filename and model filenames from the web application, to loading the audio and model data into memory, to extracting a variety of features from the song, to applying the model to the extracted features, to returning the predicted results to the SEASR community hub (a web application) for visualization. Every time a different song is selected, the web application executes this same flow.

sonofblinkieworkflowsm.jpg

REFERENCES

  1. Funded by The Andrew W. Mellon Foundation and the National Science Foundation (Grant No. NSF IIS-0327371). Thanks to M. C. Jones and the SEASR team for their technical assistance.
  2. IMIRSEL is directed by Dr. J. Stephen Downie, Graduate School of Library and Information Science (GSLIS), UIUC (jdownie@uiuc.edu). His Co-PIs on the Son of Blinkie system are Kris West, School of Computing Science, University of East Anglia and Xiao Hu, GSLIS, UIUC.
  3. Downie, J.S., Ehmann, A.F., and Tcheng, D. 2005: Real-time genre classification for music digital libraries. JCDL’05, 337.
  4. NEMA Website: http://nema.lis.uiuc.edu.
  5. SEASR Website: http://www.seasr.org.

Throughout March, SEASR and I-CHASS hosted humanities and social sciences research teams selected for their diversity of approach and interest:

Global Middle Ages: “Global middle ages” is a term in Medieval Studies that designates an interest in the middle ages across the world, i.e., non-Western societies.  The research group (Susan Noakes, French and Italian, Medieval Period, U. Minnesota; Geraldine Heng, Medieval and Women’s Studies, English, U. Texas-Austin; Ayhan Aytes, doctoral candidate in Communication at UC-San Diego, and also a medievalist) thus intends to create a digital resource that establishes and enriches researchers’ understanding of how non-Western societies contributed to medieval European culture (approximately 500-1450 ce).  The design for this project is centered on a mapped narrative of cultural influences coming out of Africa (e.g., the former provinces of Rome in the north, including Egypt; later, Islamicized Africa, especially Moorish civilization; and, later still, Western Africa as a site of empires as well as the transatlantic slave trade).  It will thus ground the historical for users through appeals to their temporal, visual, and spatial imaginations.  As with digital timelines, such mapped narratives tend to offer waypoints to users at which they can “stop” to browse in-depth information provided in a variety of media forms.

Peace and Nonviolence:  This project brings together researchers who have worked to promote peace and non-violence through informed activism.  They are uniformly interested in the social causes of violence.  Steven Valdivia, Independent Scholar (former Executive Director, Crisis Intervention Network-LA), and Fernando Hernandez, Education, CSU-Los Angeles (emeritus) are two researchers working on LA gangs.  They are especially interested in how governmental responses to poverty, minority status, and gang activity have fostered gang formation and violence.  They are seeking means of counteracting gang formation that might be recommended as public policies.  One theory they hope to prove is that the militarization of response to gang activity has worsened rather than improved gang violence.  The researchers from the Southern Poverty Law Center (Mark Potok and Heidi Beirich) are interested in research subjects that fit their civil rights mission, which the center pursues through its “tolerance education programs, its legal victories against white supremacists and its tracking of hate groups.”  They are especially concerned with researching the formation of hate groups (e.g. white supremacist), particularly how they hail new members.

Digital Portfolio Project:  Virginia Kuhn (Research Assistant Professor, Associate Director of the Institute of Multimedia Literacy, USC School of Cinematic Arts) has just led the first class through a new, intensive program at IML.  Their senior year culminates in a major multimedia design project, producing a finished piece with support work for each of the 30+ students.  Because archiving technology is increasingly available and because the new program is an important focus for the school, Dr. Kuhn wants to find a stable and innovative means for archiving these projects and retrieving information from them—with her ultimate goal being to produce a persistent, state-of-the-art pedagogical resource at USC, one that could serve as a model for other programs.  According to Dr. Kuhn’s official faculty bio, the “project was recently awarded a large (3 terabyte) allowance of storage space on SDSC’s TeraGrid.”  Consulting on the project are ISU’s Cheryl Ball, a specialist in digital composition and rhetoric (English) and Editor of Kairos, and Elijah Wright, a doctoral candidate at Indiana University’s School of Library and Information Science.

We are working with these teams to apply and further develop SEASR’s capabilities, and will feature their projects in a SEASR community-building workshop later this summer.

Loretta Auvil and Amit Kumar participated in MONK’s latest Hackfest (February 7-10, 2008, Chicago).

In preparation for the meeting, Peter Groves produced an icon to suggest how well a particular file or feature contributes to supervised classification, a feature MONK anticipates adding to the feature display in the Search by Example toolset. At the meeting, Amit Kumar (who is tasked with developing the MONK workbench) and other MONKies connected new proxy calls through the workbench, which will include SEASR calls. Loretta Auvil started toward an unsupervised classification of the TEI-A verion of witchcraft files through SEASR, to advance research for Dr. Kirsten Uszkalo’s use case.

At meeting’s end, the MONK team requested that SEASR develop a clustering tool written in Google Web Toolkit, to be tested on the Nineteenth-Century Fiction and Witchcraft databases.