Concepts

What is a pipeline?

There are many ways to define a pipeline. Here we define a pipeline by a computational architecture. A pipeline consists of two basic types of software, executors and modules.

What is a pipeline executor?

A pipeline executor waits for requisite inputs to be available to trigger execution of pipeline modules. The executor need not know much about the data or modules. It only needs to know how the data triggers the execution and how to execute pipeline modules on the data.

One way to categorize a pipeline executor is as data driven. It is generally started once and remains running as data flows through the pipeline.

An executor may run more than one module at a time. Typically this would be different input data though clever design could allow some parallization on the same data.

What is a pipeline module?

A pipeline module transforms its inputs to its outputs. In contrast to a pipeline executor, a pipeline module may be categorized as command driven. The modules are typically only running while they process their inputs. In most pipelines the modules do not require user interaction.

What is a load balancing?

When more than one module may be executed at a time there is a potential for optimizing the execution of the pipeline system. This generally requires multiple CPUs. Some load balancing may be intrinsic to the software outside of the pipeline such as in the operating system or use of parallizing compilers. Load balancing can also occur within the context of the executors or modules.

There are at least two distinct types of balancing. "Process balancing" selects where a module is to be run. "Data flow balancing" selects where the data is channeled.

Can an pipeline executor be a pipeline module?

Both executors and modules transform inputs to outputs. Therefore, it is possible for an executor to also be a module in a parent executor. When an executor is a module it is started by the parent executor and then exits when it is done transforming the input data. An alternative is if the pipeline executor is started separately and remains running. Then it is not a module but part of a network of pipeline executors.

What is a pipeline executor network?

There may be many pipeline executors which are running and waiting for data triggers. Therefore, the output data from one pipeline may provide the input data trigger for another pipeline. This suggest that multiple interacting pipelines can be thought of topologically as a kind of network. This means there can be many types of pipeline networks from a general web to more structured hierarchies and trees.

In principle all the modules and their trigger rules can be merged under a single executor. This is most clear on a single CPU organization but less obvious with multiple CPUs. So why use multiple executors? This is for modular design and ease of understanding and debugging. If the total pipeline system can be decomposed into sets of logical transformation then having one executor for each logical transformation allows modular development and verification.

We can discriminate various types of interactions between pipeline executors depending on whether the modules are endpoints or intermediate stages; though in a sense the idea of modules occuring in a linear sequence within an executor is more a logical and mental construct than a requirement.

There are two simple cases we can identify. One is where the final result of one pipeline executor triggers the initial stage of another executor. This is like a serial connection of executors. Another is when an intermediate stage in one executor triggers the input module of another and then the final output of the second executor returns to the next stage of the first executor. This second case is somewhat like thinking of the second executor as a module or subroutine in the first executor.

The second topology is not very interesting unless the data triggers are actually parallelizable. In this case the first executor can produces a set of data triggers, one for each of a set of independent executors (likely run on different machines). The next module in the first executor, or some other executor, would trigger only after all the other executors have produced outputs. This is basically a way to implement data parallel structure as part of a logical pipeline.

Figure 1 shows example connections in a pipeline network. The lines show the data flow. Lines entering a pipeline box on the left are input data triggers and on the right are final output. Lines on the bottom of the boxes are intermediate data within a pipeline which trigger other pipelines. This figure shows a master pipeline A which is triggered by some input. At some point data is generated which triggers pipeline B. Within B data is generated that triggers C, D, and E. This could represent a parallel step. The results of these return to B to trigger another stage in B. A later stage in B triggers F. The final output of B triggers G. The output of G returns to a stage in A.

What is a pipeline executor primitive?

We define a pipeline executor primitive as one which does not depend on another executor internally. Its input data and output data may come from and go to another pipeline but no internal module triggers or waits for another executor.

What are data parallel pipeline executors?

These are excutors composed of identical modules operating on pieces of a larger data set. These executors typically run in parallel and on different CPUs. They are also typically executor primitives.

Mosaic Pipeline Executor Network

We start by defining two basic executors.

MEF executor
The MEF executor performs operations involving all parts of a mosaic exposure. The term MEF does not necessarily imply MEF format files, though that is what we currently expect will be the input trigger data, but that multiple elements of an exposure form the principle data flow.
SIF executor
By analogy the "single image format" (SIF) executor performs operations involving only a single simple image. By image this could mean a CCD, an amplifier readout, or even smaller pieces.

The data flow would consist of a mosaic exposure being provided as the initial input trigger to an MEF executor. A module in this executor would trigger multiple identical SIF executors (ideally on different CPUs using a load balancing to assign the CPUs). At intermediate points and at the end of the SIF executor stages the results would return to the original (or possible different) MEF executor.

An intuitive heirarchical organization of executors and CPUs immediately come to mind. There is a master CPU running an MEF executor and other machines running SIF executors. However, what is a more versatile configuration, and simpler to configure and maintain, is to have every CPU run both an MEF and SIF executor which are identical across the machines. In normal operation it would likely be the case that all the raw MEF mosaic data would start at one MEF executor on one machine as in the heirachical picture. But it would also be possible to feed multiple MEF executors on different machines. This would eliminate a possible bottleneck or single point of failure in having a single master CPU.

In this approach there would be N machines running both MEF and SIF executors. The MEF executors would include load balancing to assign data parallel operations to SIF executors. On top of this system would then be another load balancer at the data feed point that could assign new raw mosaic data to the MEF executors on free machines.

So not only would this system be data parallel on a single mosaic exposure but would also be data parallel on a stream of mosaic exposures! Figure 2 shows a schematic of this system. It shows MEF parallelism by two and SIF parallelism by 2 all based on CPUs with identical software installations. The same set of CPUs could also be configured with one master MEF and five SIF executors just by adjusting the data flow balancers (a simple file in the IRAF networking method proposed below). Or the SIF executors on the same machine as the MEF executors could also be used.

Load Balancing in a Mosaic Pipeline Executor Network

There are many types of load balancing approaches and software. One that is particularly easy to use in the context of the mosaic pipeline executor network, where the atomic data chunks are fairly uniform in size and the CPUs are fairly uniform in speed, is data flow balancing. In this context, when an MEF executor separates the data into pieces to be executed independently on multiple SIF executors, the pieces are assigned to executors with the least amount of data pending.

For the type of executor where the triggers consist of data being placed in a directory (such as OPUS), the load balancing component in the executor or module looks at the number of files. From the pool of SIF trigger directories it assigns pieces to those with the least backlog. This indirectly optimizes for different CPU speeds since an SIF executor on a slower machines will soon have a higher backlog and no new data will be assigned until it processes some of its data.

This approach has the primary benefit that no specialized software is required. All a module in the MEF executor whose output data is mulitple data parallel pieces for various SIF executors has to do is look through a list of network directories (typically on different CPUs), determine the number of trigger files pending in those it can reach (to account for machines going down), and then assign the M pieces to the N directories in order of least backlog. N can be less than or greater than M and N can change during runtime.

A status monitor (described elsewhere) will eventually notice that any pieces that were pending have timed out due to a macnine crashing and the operator can restart things for another machine. If the computer crash doesn't destroy data, the alternative is that when the machine reboots and the executors are restarted they can continue on provided the executors are designed to be restartable from the point of the crash.

Strawman Detail

The first stage of the NOAO Mosaic Pipeline starts with an MEF executor. The point where this will break up the data for multile SIF executors is after crosstalk correction. The I/O can be optimized by having the crosstalk operation output the pieces directly to the trigger directories. This can be done using IRAF networking.

So the input to the crosstalk module is an MEF file in some directory. The list of output names is generated by looking through a list of potential CPUs and trigger directories and generating the output names as outlined previously with the IRAF nodename syntax. The normal output image open and I/O will then simply cause the data to appear in the SIF trigger directory. This eliminates the need for a separate MEF splitting and rcp-type of operation.

Steps: