Archive for January, 2008

Stuart Dunn and Tobias Blanke discuss SEASR in their report of the UK e-Science All Hands 2007 meeting published in D-Lib Magazine (January/February 2008, Vol. 14 No. 1/2), “Next Steps for E-Science, the Textual Humanities and VREs: A Report on Text and Grid: Research Questions for the Humanities, Sciences and Industry.”

Of SEASR, the authors write, “Thinking in terms that reach beyond conventional library frameworks highlights a need to consider the process by which unstructured data becomes structured. This was the primary issue considered by Loretta Auvil from the National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign, who presented on the Software Environment for the Advancement of Scholarly Research (SEASR) project. This API-driven approach enables analyses run by text mining tools, such as NoraVis ( and Featurelens ( to be published to web services. This is critical: a VRE that is based on digital library infrastructure will have to include not just text, but software tools that allow users to analyse, retrieve (elements of) and search those texts in ever more sophisticated ways. This requires formal, documented and sharable workflows, and mirrors needs identified in the hard science communities, which are being met by initiatives such as the myExperiment project ( A key priority of this project is to implement formal, yet sharable, workflows across different research domains. As different research domains have very different protocols for structuring and managing textual archives, the utility of being able to use tools such as Nora and Featurelens in a SEASR-type environment will become ever more important in the development of VREs for textual studies. For example, a numerical extraction system like that presented by the Open Boek project has significant utility when applied to archaeological reports, but such utility is clearly not confined to that domain. In the scientific communities, there has been interest in digital versions of lab books in VREs ( Numeric data is likely to be critical to such exercises. Like Open Boek, the JISC-funded Integrative Biology VRE project was also concerned with the textual context of numbers: it found that digital recognition of equations was a significant problem, a clear case of crossover. Such analyses could, in theory, be delivered to the user by an architecture like that described by Auvil.”

The authors conclude, “[…]Although Web 2.0 has not revolutionized scholarly research in the way envisaged originally, researchers need to be able to annotate texts on which they are working, and to be able to store, search and structure those annotations. In a way, such a structure might resemble a (user-created) digital library within or across other digital libraries. Detailed semantic documentation of the links between the annotation and the annotated text is necessary, along with documentation of when, why and by whom the annotation was created. Furthermore, it would be highly desirable for any additional chunks from separate texts that may be relevant to the annotation (e.g., containing the same name, geographic reference, numeric data, etc.) to be identified: the workflow management architectures presented both by SEASR and GATE suggest this is possible.”