=Overview=

Meandre is a semantic enabled web-driven, dataflow execution environment. It provides the machinery for assembling and executing data flows -software applications consisting of software components that process data (such as by accessing a data store, transforming the data from that store and analyzing or visualizing the transformed results).

Within Meandre, each flow is represented as a graph that shows executable components (i.e., basic computational units, or building blocks) as icons linked through their input and output connections. Based on the inputs and properties of a executable component, a unique output is generated upon execution.

Meandre also provides publishing capabilities for flows and components, enabling users to assemble a repository of components for reuse and sharing. This allows users to leverage other research and development efforts by querying and integrating component descriptions that have been published previously at other shareable repository locations.

Meandre builds on three main concepts: dataflow-driven execution, semantic-web metadata manipulation, and metadata publishing. In this section, We will define and briefly explain each concept, which will provide a basis for the rest of the material. We also provide complementary links to help you explore these areas in more detail.

== Dataflow Execution Engines==
Conventional programs perform their computational tasks by executing a sequence of instructions. One after another, each code instruction gets fetched and executed. Any data manipulation is performed by these basic units of execution. In a broad sense, this approach can be termed “code-driven execution.” Basically, any computation task is regarded as a sequence of code instructions that ultimately manipulates data.

However, data-driven execution (or dataflow execution) revolves around the idea of applying transformational operations to a flow or stream of data. In a data-driven model, data availability determines in what sequence code instructions are executed.

The analog of the dataflow execution model is the black box operand approach. That is, any operand (operator) may have zero or more data inputs. It may also produce one or more data through its data outputs. The operand behavior is controlled by properties (knobs). Each operand performs its operations based on the status of its inputs. For instance, an operand may require that data be available in all of its inputs to perform its operations. Others may only need some, or none.

A simple example of a black box operand could be the arithmetic ‘+’ operand. This operand can be modeled as follows:

  1. It requires two inputs.
  2. When two inputs are available, it performs its ‘+’ operation.
  3. It then pushes the result as an output.

Such a simple operand may have two possible implementations. The first defines a component (Meandre terminology for a black box operator) with two inputs. When data is present on both inputs (Meandre terminology for an all filled inputs firing policy), then the operator is executed (fired in Meandre terms). The operator produces one data to output, which may become the input of another operator.

The second option is to create a component with a single input that adds together every two consecutive data pieces received. The component would have an internal variable that would store the first data piece of a pair. When the second data piece arrives, it would be added to the first and produce an output result. The internal variable would then be cleared so that the component will know that the next data piece received is the first of a new pair.

The following terminology will be utilized in describing Meandre:

  1. Component: A basic unit of processing.
  2. Input ports: Inputs required by a component.
  3. Firing policy: The policy determining when a component should be fired – executed (e.g. when all/any input ports contain data).
  4. Output ports: Outputs produced by component execution.
  5. Properties: Component variables used to modify component behavior.
  6. Internal state: The collection of data structures designed to manage data between component firings.

All dataflow execution engines provide a scheduler that determines the firing (execution) sequence of components. Meandre uses a decentralized scheduling policy designed to maximize the use of multicore architectures. It also allows a custom threading priorization system to be built on top of the Java threading system.

==Semantic Web 2.2 Concepts==
The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries. It is a collaborative effort led by W3C. In W3C’s words:

The Semantic Web is about two things. It is about common formats for integration and combination of data drawn from diverse sources, where on the original Web mainly concentrated on the interchange of documents. It is also about language for recording how the data relates to real world objects. That allows a person, or a machine, to start off in one database, and then move through an unending set of databases which are connected merely by the network but by being connected semantically.

The semantic web effort relies on the Resource Description Framework (RDF). RDF is a simple notation for expressing graph relations. It relies on XML to provide a set of conventions and exchange information. It also provides a mechanism to express metadata and to determine how that metadata is arranged.

We will give an overview of RDF a a foundation to help with the understanding of the Meandre architecture decisions and conventions. Deeper introductions to RDF can be found elsewhere (http://www.ibm.com/developerworks/library/w-rdf/ and http://www.w3schools.com/rdf/rdf_intro.asp).

To explain some of the basics of RDF, Let’s use a simple example. RDF’s basic expression is called a triple. A triple is a predicate that describes some property about an object. RDF originated as part of W3C’s efforts. As a result, it is natural that RDF objects are uniquely characterized by URIs (Universal Resource Identifiers), which W3C developed. http://meandre.org/articles/meandre-execution-engine-architecture.pdf or file:///tmp/potato.png are examples of objects identified by URIs. Properties also take the form of an URI; for instance http://purl.org/dc/elements/1.1/creator is a property identifying author of a given object.

Each triple has several standardized properties. For instance, the one immediately above has the property of belonging to the Dublin Core initiative. Property values can take two possible forms: literal (which may or may not be typed) or another URI. URIs (or objects) are usually referred as resources in the RDF jargon.

An example of a triple could be

file:///tmp/potato.png http://purl.org/dc/elements/1.1/creator “John Doe”

or its typed form

file:///tmp/potato.png
http://purl.org/dc/elements/1.1/creator
“John Doe”^^http://www.w3.org/2001/XMLSchema#string

This typed form indicates that “John Doe” is a string.

Triples can be represented graphically—as the figure below shows.

A simple triple example

A collection of triples defines a graph. Potato may have other properties, such as creation date and format.

A three triple graph

RDF can be expressed in several formats (N3, TTL, or RDF/XML). Meandre relies on the XML form of RDF to provide a standardized exchange format of its main metadata descriptions. The example presented in the figure below could be expressed in RDF/XML as:

<?xml version=”1.0″ encoding=”UTF-8″ ?>

<rdf:RDF
xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#”
xmlns:xsd=”http://www.w3.org/2001/XMLSchema#”
xmlns:rdfs=”http://www.w3.org/2000/01/rdf-schema#”
xmlns:dc=”http://purl.org/dc/elements/1.1/”>

<rdf:Description rdf:about=”file:///tmp/potato.png”>
<dc:creator
rdf:datatype=”http://www.w3.org/2001/XMLSchema#string”>
John Doe
</
dc:creator>
<dc:date
rdf:datatype=”http://www.w3.org/2001/XMLSchema#dat”>
2007-08-23T17:41:20
</
dc:date>
<dc:format
rdf:datatype=”http://www.w3.org/2001/XMLSchema#string”>
PNG
</
dc:format>
</rdf:Description>

</rdf:RDF>

RDF/XML has two versions for notation: abbreviated and expanded. The example presented above uses the abbreviated notation for RDF/XML.

RDF also supports attaching properties to already created statements. Such action is called reification. A detailed description of the reification process can be found here.

RDF is also used to define what type of metadata can be expressed. An RDF schema (or RDFS) is a collection of triples (or statements) that states the valid structure of an RDF description. A useful validation and visualization tool for RDF/XML can be found at the W3C site.

==Publishing Schemes==
Meandre relies heavily on publishing schemes to create a distributed repository of shareable components. Each piece of the repository is published on some reachable, web-based location. RDF standardizes the publishing process. For instance, this standard may be expressed as a simple RDF file containing the description of the components for a SPARQL call to a metadata store (that call is usually embedded on the options of the URL following the SPARQL protocol). In Meandre, each component description is self-contained, in terms of having all the required information for its retrieval, regeneration, and execution.

Meandre’s publishing scheme allows dynamic inspection of published repositories. That is, Meandre can inspect locations (local files, remote web objects, or metadata stores using the SPARQL protocol) to discover a new location where components are published. This way, the discovered components can be retrieved to form a custom-made repository, which can also be published for others to use. Hence, different component views and flavors are easy to create. Moreover, this scheme simplifies upgrading components and fixing bugs.

=Meandre Components=
A Meandre component serves as the basic building block of any computational task that is run on the Meandre semantic, data-driven workflow. There are two kinds of Meandre components: (1) executable and (2) flow. Regardless of type, all Meandre components are described using metadata. Executable components also require an executable implementation form that can be understood by the Meandre execution engine.

The rest of this section will present a quick overview of the basic semantics for executable and flow components.

== Executable Components==
Meandre’s executable components serve as its basic units of computation. Complex computational tasks can be easily achieved by grouping executable components together in flows, as described in the next section.

In the simplified form already elaborated above, a component is analogous to a black box operand approach. Any operand (executable component) may ingest zero or more data by its data inputs and may produce one or more data by its data outputs. Based on the status of its inputs, each executable component may be able to perform its operations based on the status of its inputs. Each executable component may also have knobs (properties) that control its behavior.

A Meandre executable component is described in terms of four elements: (1) basic metadata describing the component, (2) metadata describing its input/output data ports, (3) , the properties associated with the component, and (4) the location and form of the implementation of the component itself.

===Basic Metadata===
Meandre’s metadata relies on three ontologies:

  1. RDF: The RDF ontology (http://www.w3.org/1999/02/22-rdf-syntax-ns#) serves as a base for defining Meandre components.
  2. DC: The Dublin Core elements ontology (http://purl.org/dc/elements/1.1/) provides basic publishing and descriptive capabilities in the description of Meandre components.
  3. Meandre: The Meandre ontology describes a set of relationships that model valid components, as understood by the Meandre execution engine architecture.

Each component requires the metadata elements listed above and expressed below. To improve readability, a short-hand notation is used for each of the metadata elements. That is, dc:description stands for http://purl.org/dc/elements/1.1/description.

In addition to these metadata elements, each component is uniquely identified by a URI. For instance, http://mydomain.org/meandre/components/hello-world describes a unique Meandre component. This is the base URI (resource) to which all of the metadata is attached; it can be expanded to identify complex metadata structures.

  • rdf:type:
    Identifies the type of Meandre component identified by a URI. Executable components are typed as http://www.seasr.org/meandre/ontology/executable_component)-example.
  • meandre:name:
    Provides the name of the component identified by the provided URI.
  • dc:creator:
    The nameĀ  of the person who created the component.
  • dc:date:
    Marks the date when the component identified by the URI was created. The date follows the format Year-Month-Day T Hour:Minute:Second. For instance, 2007-09-05T14:58:20 indicates that the component was created on September 5th, 2007 at 2:58:20 pm.
  • dc:description:
    Gives an accurate description of what computation is encapsulated by the executable component that is identified by the URI.
  • meandre:tag:
    Identifies a tag that describes some facet of the component’s functionality.
  • dc:rights:
    Names the license associated with the component. It is intended to provide the wording of the license and pointers to the complete description of the licensing’s terms.
  • meandre:firing_policy:
    Identifies the firing policy used to execute the component, which Meandre requires when a component has more than one input port (details about input ports may be found in the next section). Two options are available: all or any. All requires that all input data ports be populated to fire the execution of the component, whereas any fires a component any time a data port is populated.
  • meandre:runnable:
    The type of executable component identified by the URI. The first release of Meandre supports components written in Java, and provides the basic machinery to suport components written in other languages/frameworks. These components are label as java.
  • dc:format:
    Describes the format of the binary object that implements the described component. The first release of Meandre provides support for classes implementing the Meandre executable component interface. These components are marked as java/class.
  • meandre:resource_location:
    Identifies where the binary implementation of the component is located. A standard URI identifying a binary implementation may look like this: http://domain.org/component/org.domain.package.MyMeandreClass (another demonstration URI). The standard Meandre convention for Java relies on a URI assembled from the base location of the implementation plus the full name of the class.
  • meandre:execution_context:
    Identifies the third-party auxiliary code/libraries necessary for running an executable component. For instance, adding the following execution context URI http://domain.org/lib/colt.jar tells Meandre that the executable component described by the URI requires COLT library to successfully run the component.