The EXAMPLE1 Pipeline Demonstration Francisco Valdes (fvaldes@noao.edu) Derec Scott (dscott@noao.edu) Francesco Pierfederici (fpierfed@noao.edu) The command "runexample1" provides a complete demonstration of a pipeline application (PA) called EXAMPLE1. It starts (or restarts) the basic components of NHPPS -- the name server and node manager server -- and the subpipelines (or services) which make up the pipeline application. It sets up some test data and triggers the data flow. Finally a simple pipeline monitor is run to visualize the progress of the pipeline. After finishing the example the pipeline remains running and you stop it with the command "stopall". The example pipeline application consists of three pipelines. There are many ways pipeline applications may be structured. This is an example of a common structure having a top level pipeline, called "main" in this case, which triggers datasets. It orchestrates some processing stages and calls other pipelines, abc and xyz in the examle, to operate on parts of the problem, often in parallel and on distributed nodes. Within each pipeline some stages run serially and others run in parallel. The "simple pipeline monitor" used here is intended only for the simple example pipelines. In a real application the state of the pipeline, called the blackboard, is displayed in a GUI which also incorporates control functionality. This simple program is used because it does not depend on any GUI software. It simple dumps the blackboard every few seconds and writes it to the terminal. By clearing the terminal before each refresh it gives the feeling of an updating monitor. The columns are the dataset names the pipeline name, the node on which it is running, and the state of the stages. Note that subpipelines operating pieces of the original dataset generally have dataset names which are related to the parent dataset name. A pipeline application may use the blackboard in a variety of ways with particular codes. In this example the stage codes use 'p' to indicate processing, 'c' to indicate completed stages, 'w' to indicate stages waiting for returns from other pipelines and 'd' to indicate the completion of the final stage in the pipeline. The first stage is left as 'p' until the final stage completes. In this example the main pipeline runs some stages, sends parts of the dataset to pipelines abc and xyz which run in parallel. The subdatasets also run in parallel within the pipeline. For the most part the stages simply sleep to simulate some kind of processing. A few stages call the other pipelines, return to the main pipeline or wait for all the subdatasets to return. To understand the data flow in detail you should examine the pipeline description files in the xml directory of the pipeline application source directory. The pipeline description language (PDL) is structured in XML (so you can either read these directly or use an XML browser) and you can read technical specification with "man NHPPSXML". It is also useful to look at the "runexample1" script to see the individual commands used to start the components and trigger the pipeline. The commands in the examples/bin directory are scripts (python and csh) used in the example pipelines. The call, wait, and return are instructive examples of how those logical functions may be used. The interesting features you will see in the example pipeline include: 1) Sending data, waiting, and receiving data from pipelines within the PA. It is important to note that the NHPPS does not in and of itself provide any direct send/wait/receive functions; though this might be added in the future. Currently thse are written by the pipeline architect to suit their individual needs. The included call/wait/return functions were built to demonstrate the capabilities and openness of the system. 2) The asynchronous and parallel execution of modules within a pipeline. While a significant number of the modules in the sample pipelines do execute in a syncronous method, you will note that in the main pipeline 'later' modules execute while the abc and xyz pipelines are processing. 3) Effect of starting X instances of a pipeline. The number X represents the number of datasets on which a particular module may be executing (or in the processing state) at a time. In this example the abc pipeline runs with two instances so that any stage may be running of two different datasets at the same time. This is noticible in the pipeline monitor and is particularly useful with multicore machines. 4) Module execution through 2 of the 3 methods provided in the NHPPS, File and OSF triggers. 5) How to call programs on the host system using the 'Foreign' XML element.