=Presentation=


=Overview=

Meadre’s success relies on its ability to easily assemble data-intensive flows. Flows may be easily assembled by using the Meandre Workbench (MW), a icon based programming environment for assembling data-intensive flows. ZigZag targets rapid application development by speeding up the flow construction cycle.

ZigZag can accelerate data-intensive flow development cycle. It allows you to easily a describe data-intensive flow using the ZigZag language, which can then be compiled into a self-contained flow task for later execution.

ZigZag is a simple language for describing data-intensive flows; it is modeled after Python’s simplicity. ZigZag is declarative language for expressing the directed graphs (DG) that describe flows. A compiler is provided to transform a ZigZag program (.zz) into a Meandre self-contained task—or Meandre archive unit (.mau). Mau(s) can then be executed by a Meandre engine. Command-line tools allow ZigZag files to compile and execute.

The language provides four basic constructs:

# Component discovering and aliasing (CDA): Retrieve components from a repository location and create an alias for them.
#Component instantiation (CI): Instantiate a component that will be part of the data-intensive flow.
#Instance modification (CM): Change the behavior of a instance based on its properties.
#Instance invocation (II): Describe the data-intensive component relations with other components in the same flow.

=Example=

==Flow diagram==

The flow below pushes a sequence of ten strings that get converted to uppercase and finally printed to the console.

Simple hello world flow

Simple hello world flow

==ZigZag code==

The code below shows the ZigZag code that represents the data-intensive flow presented above.

#
# This flow creates a flow converts to uppercase a sequence of
# strings and then prints it to the console
#
# @author Xavier Llorà
# @date March 7, 2008
#
# @file: example.zz
#

#
# Imports the three required components and creates the component aliases
#
import <http://demo.seasr.org:1714/public/services/demo_repository.ttl>

alias <meandre://test.org/component/push-string> as PUSH
alias <meandre://test.org/component/to-uppercase> as TOUPPER
alias <meandre://test.org/component/print-object> as PRINT
#
# Creates four instances for the flow
#
push_hello, to_upper, print = PUSH(), TOUPPER(), PRINT()

#
# Sets up the properties of the instances
#
push_hello.message, push_hello.times = "Hello World!!!", "10"

#
# Describes the data-intensive flow
#
@hello = push_hello()
@upper = to_upper(string:hello.string)
print(object:upper.string)
#
#
#

The code above can be compiled using zzc.jar and run using zzre.jar, or interactively from the zz console.

=Automatic Parallelization=

Before digging into the details, imagine the following situation. You have a flow that does a good job, but at a certain point when you keep pushing more and more data through it you realize that you could use multiple instances of the same component in parallel to boost the flow performance. This will also help max out all those cores you have sitting idle. Wouldn’t it be great if you could just say, for this component instance give me 4 copies that process data in parallel? Wouldn’t it be also greart if you didn’t need to worry about connecting anything?

Let’s assume that in the previous flow example, the conversion to upper case take a really long time governing the overall execution. That would be a perfect example to illustrate the parallelization capabilities that ZigZag provides

==Unordered parallelization==

Imagine now that you want a parallelized version of the component instance in the middle (the one that does most of the job). We can modify our ZigZag’ code to force the parallelization of that component instance. This modification will look like

#
# This flow creates a flow converts to uppercase a sequence of
# strings and then prints it to the console
#
# @author Xavier Llorà
# @date March 7, 2008
#
# @file: example.zz
#

#
# Imports the three required components and creates the component aliases
#
import <http://demo.seasr.org:1714/public/services/demo_repository.ttl>

alias <meandre://test.org/component/push-string> as PUSH
alias <meandre://test.org/component/to-uppercase> as TOUPPER
alias <meandre://test.org/component/print-object> as PRINT
#
# Creates four instances for the flow
#
push_hello, to_upper, print = PUSH(), TOUPPER(), PRINT()

#
# Sets up the properties of the instances
#
push_hello.message, push_hello.times = "Hello World!!!", "10"

#
# Describes the data-intensive flow
#
@hello = push_hello()
@upper = to_upper(string:hello.string)[+AUTO]
print(object:upper.string)
#
#
#

The ”'[+AUTO]” tells the ”’ZigZag”’ compiler to parallelize the ”to_upper” instance based of the underlying architecture.  You can also specify the number of parallel instance you want, for instance [+4] will create 4 parallel instance. The resulting flow generated by the compiler looks as follows:

Unordered parallelization

Unordered parallelization

Notice that ZigZag has created 4 parallel instances of the component. It has also introduced a ”mapper” instance that is in charge of distributing the incoming data to each of the parallel instance. Each of the parallel instances then push the data straight to the ”print” instance. That’s it. The ”’ZigZag”’ compiler has parallelized and connected a new flow, with almost no effort for you. This is called unordered parallelization, since data may be arriving to the ”print” flow out of the original order in which they were generated by the ”push” component instance.

==Ordered parallelization==

Sometimes applications need to maintain the order of the data being pushed through the flow. ZigZag can also parallelize instances that preserve the order of the data going through the flow (at the cost of a little overhead). The same example presented above can be turned into an order-preserving one as follows

#
# This flow creates a flow converts to uppercase a sequence of
# strings and then prints it to the console
#
# @author Xavier Llorà
# @date March 7, 2008
#
# @file: example.zz
#

#
# Imports the three required components and creates the component aliases
#
import <http://demo.seasr.org:1714/public/services/demo_repository.ttl>

alias <meandre://test.org/component/push-string> as PUSH
alias <meandre://test.org/component/to-uppercase> as TOUPPER
alias <meandre://test.org/component/print-object> as PRINT
#
# Creates four instances for the flow
#
push_hello, to_upper, print = PUSH(), TOUPPER(), PRINT()

#
# Sets up the properties of the instances
#
push_hello.message, push_hello.times = "Hello World!!!", "10"

#
# Describes the data-intensive flow
#
@hello = push_hello()
@upper = to_upper(string:hello.string)[+AUTO!]
print(object:upper.string)
#
#
#

The ! after AUTO (of the number of parallel instances you want) tells the compiler to generate a parallelized flow that maintains the data order. The picture below shows the resulting flow that guaranties the order.

Ordered parallelization

Ordered parallelization

ZigZag introduces a reducer after the parallel instance to guarantee the order.

One Response to “ZigZag”

  1. Xavier Llorà » Blog Archive » Liquid: RDF endpoint for FluidDB Says:

    […] Finally the serialized text is printed to the console. The equivalent code could be express as a ZigZag script […]