Overview of SEASR
The Software Environment for the Advancement of Scholarly Research, SEASR (pronounced SEE-ZER), offers the humanities, arts, and social science communities a transformational cyberinfrastructure technology.
SEASR eases scholars’ access to digital research materials now stored in a variety of incompatible formats and enhances scholars’ use of them through analytics that can uncover hidden information and connections. SEASR fosters collaboration, too, through empowering scholars to share data and research in virtual work environments.
SEASR technology is also designed to enable digital humanities developers to design, build, and share software applications that support research and collaboration. Developers can tailor applications both in whole and part to fit scholars’ research needs—from changing the visualization landscapes that provide them with views of analytics results, to inserting new analytics that support their linguistic analysis for different time periods or languages, to readjusting entire steps in the work process so that researchers can validate results and alter their queries. Developers can even reuse components developed in programming environments other than SEASR’s (Java/RDF).
Specifically, then, SEASR addresses four key needs:
- the ready transformation of (semi-)unstructured data (including natural language texts) to the structured data world through building extensible software bridges
- improved basic knowledge discovery through supporting enhanced analytics
- time- and distance-independent scholarly and technology exchange through constructing virtual research environments
- fully open-sourced development for maximizing community involvement, such as by sharing user applications through community repositories
To address these needs, SEASR draws upon two complementary technology paradigms—service-oriented architecture and semantic-web computing.
Service-oriented architectures (SOAs) separate large applications into distinct units (called services, modules, or components). Each service can be imagined as a program in miniature that can be distributed over a network and combined and reused to create larger programs—regardless of the type of platform on which the service originated, whether laptop or computing cluster. Services communicate with each other by passing data from one service to another, or by coordinating an activity between two or more services (these streams of transmitted data are called “data flows”). SOAs enable new, large applications to be built with greater speed, uniformity, and flexibility. They enable the reuse of existing IT assets, while advancing their capabilities—providing cost savings as well as expertise sharing.
SEASR ensures that users can make the most of these SOA features through employing only open-source software and through providing an icon-based, integrated development environment (IDE) that empowers humanities, art, and social science developers to visualize shared and original creations with ease. What’s more, SEASR’s IDE places few limitations on how developers must define services within its dataflow engine. The environment offers clearly defined, simple interfaces that enhance developers’ ability to integrate pre-existing or legacy services of diverse implementations. And SEASR’s IDE is dynamic, allowing users to change the behavior of running data flows and to use a current data flow to build more complex applications, since each data flow publishes a web user interface when it executes.
SEASR’s adoption of service-oriented architecture technology is complemented by its integration of semantic-web computing capabilities. Often touted as essential to the Web’s evolution from Web 2.0 to 3.0, semantic-web technology offers an enhanced form of web content that is meaningful to computers, enabling machines to become “much better able to process and ‘understand’ the data that they merely display at present”. Through making all web content—including “databases, programs and sensor output”—programmatically machine understandable (instead of just machine readable), the Semantic Web becomes a universal and standardized medium for gathering and exchanging not only widely distributed and diversely formatted and housed data, but also functional software components that either serve as applications or can be used to build them. In other words, via the Semantic Web, SOAs like SEASR’s can be implemented using web services. That means SEASR technologies and the tools built with them are always accessible to researchers who are as distributed as their data and services.
SEASR’s use of SOA and Semantic-Web approaches gives us an advanced foundation, but these approaches leave much to software developers’ technical imaginations. Where SEASR shines is in the efficient and effective framework our design team has built upon this foundation.
Design: SEASR’s three levels of user interface give users just the level of access with which they are comfortable, from a high level geared toward researchers who want to run and view queries, to an intermediate level aimed at those who are interested in controlling some of the technical parameters of their analytics, to a deep level at which new programming pieces can be created and larger applications built from them.
The SEASR architecture is divided into three, complementary layers:
- common services
The problem-solving layer serves as the design base. Within its visual environment, programmers connect components and services to form an integrated, domain-specific problem-solving environment. The common-services layer provides an execution environment that maps dataflows from the problem-solving layer to the resource layer. The resource layer allocates and manages computational resources.
The SEASR team designed and developed two essential elements of the problem-solving layer: a developer workbench and a community hub web application. For the common-services layer, the team developed a semantic, web-driven dataflow execution environment, called Meandre (from Catalan, pronounced MEE-AN-DER). We also began migrating components from three of our partner projects—Nora, MONK (Metadata Offer New Knowledge), and NEMA (Networked Environment for Music Analysis)— to SEASR, in addition to integrating existing tools such as Weka (a machine learning software written in Java), UIMA (IBM’s now open-source Unstructured Information Management Architecture), D2K (ALG’s Data-to-Knowledge suite of data mining services), and Fedora (a general-purpose, open-source digital object repository system, also funded by The Andrew W. Mellon Foundation). For the resource layer, we installed a virtualized computational environment that provides SEASR with development, testing, and production platforms.
We have made major improvements to our user interfaces at the deepest level; i.e., those geared toward technically expert users. These advances enable developers to bootstrap the basic services provided in SEASR’s first wave of offerings. In the problem-solving layer, we have created a language for representing high-level descriptions of dataflows, called ZigZag. Using ZigZag files, a compiler assembles the required machinery allowing the execution of those flows. This provides developers with a language-based interface–an alternative to the icon-based interface that best serves less technically proficient users who still want some programming capability. This new interface can be used to code certain functions more quickly and easily, such as when applying analytics to large collections.
In the common services layer, we have developed MAU files (Meandre Archival Units) and a MAU executor. MAU files enable developers to distribute computation (compiled ZigZag-based flows) across platforms. This would allow, for instance, researchers who have proprietary data to import Meandre-based computation to run against their data, the results of which could be exported without compromising security or copyrights. Also in the common services layer, we have created a Java client that enables Java-based applications to access the SEASR infrastructure natively. Such clients enable the SEASR infrastructure to orchestrate complex humanities flows. In future, we may create other infrastructure clients—Python- and LISP-based clients for instance—depending on user needs.
The SEASR team will continue to augment our problem-solving layer with continued user-interface enhancements to web applications and design and develop an intermediate layer to connect the Community Hub and Developer Workbench. For example, we will enhance a workbench console for power users to create flows through typed commands rather than only through dragging and dropping icons. To the common-services layer, we’ll continue adding features to the Meandre dataflow environment, which will include a distributed computing engine. We will also design and develop repository discovery, provenance flow and monitoring, and an enhanced framework for the construction of visual landscapes. We will continue to develop and integrate analytics components as we work with our research partners in the humanities, arts, and social sciences. For the resource layer, we will maintain and work to optimize computational and storage environments.
SEASR Project Development
SEASR’s project development is driven by detailed use-case scenarios in three domains whose research processes represent the range of humanities needs: Literature, Music, and History. This project’s approach of “developing by doing” will generate a near term “scholar’s view” to the SEASR environment and provide recommendations from users to developers along the way. It will also incrementally deliver technical components, integrate those components, and allow for their evaluation within the context of the overall system.
We look forward to the project’s next stage, in which we will integrate many more analytics capabilities to complement the infrastructure we have built, fulfilling SEASR goal of making digital archives more useful by offering an easily usable environment that researchers can adapt for their own unstructured data analysis. Initially, SEASR will support text- and audio-based analysis, but we hope to expand our capabilities to images and moving images as well.
The technological promise of SEASR will be fulfilled not only because of our established expertise in innovating analytics and infrastructure, but also through the synergy we are creating with other digital humanities projects, and the support we are receiving from The Andrew W. Mellon Foundation. It will also be fulfilled because we are partnering with researchers whose subjects of study, whose questions, whose disciplinary vision drive our design. Their contributions to SEASR serve as our ultimate guide as we engineer knowledge for the humanities, arts, and social sciences.
Meandre provides the machinery for assembling and executing data flows -software applications consisting of software components that process data (such as by accessing a data store, transforming the data from that store and analyzing or visualizing the transformed results).
Within Meandre, each flow is represented as a graph that shows executable components (i.e., basic computational units, or building blocks) as icons linked through their input and output connections. Based on the inputs and properties of a executable component, a unique output is generated upon execution.
Meandre also provides publishing capabilities for flows and components, enabling users to assemble a repository of components for reuse and sharing. This allows users to leverage other research and development efforts by querying and integrating component descriptions that have been published previously at other shareable repository locations.
To start sifting through vast quantities of data from automation and discovering useful patterns, you can get started with SEASR in 3 ways.
- Use our Meandre server with SEASR components. Point your browser to http://demo.seasr.org:1714/public/services/ping.html for the Meandre Administrative Interface or http://demo.seasr.org/1712 for the Meandre Workbench.
- Download the Meandre software, install it, and execute the servers. Point your browser to http://localhost:1714/public/services/ping.html for Meandre Administrative Interface or http://localhost:1712 for Meandre Workbench.
- Download from the SVN repository directly.
SEASR at a Glance
Regardless of which path you take, working in SEASR is working with data. In its simplest form, working with SEASR is a three-step process. First, you read data into SEASR, then run the data through a series of transformations and analytics, and finally send the data to a results viewer. This sequence of operations is known as a data flow because the data flows record by record from the data through each data manipulation and, finally, to the destination—either a model or type of data output. Most of your work in SEASR will involve creating and modifying data flow.
At each point in the process, SEASR’s visual interface invites your specific domain expertise. Modeling algorithms, such as prediction, classification, segmentation, and association detection, ensures powerful and accurate models. Model results can easily be deployed and read into the repository and a wide variety of other applications. Then you can use, SEASR Community Hub, to deploy entire application flows that read data into a model and deploy results without using the workbench. This brings important data and results closer to scholars who need it.