There are many ways to define a pipeline. Here we define a pipeline by a computational architecture. A pipeline consists of two basic types of software, executors and modules.
A pipeline executor waits for requisite inputs to be available to trigger execution of pipeline modules. The executor need not know much about the data or modules. It only needs to know how the data triggers the execution and how to execute pipeline modules on the data.
One way to categorize a pipeline executor is as data driven. It is generally started once and remains running as data flows through the pipeline.
An executor may run more than one module at a time. Typically this would be different input data though clever design could allow some parallization on the same data.
A pipeline module transforms its inputs to its outputs. In contrast to a pipeline executor, a pipeline module may be categorized as command driven. The modules are typically only running while they process their inputs. In most pipelines the modules do not require user interaction.
When more than one module may be executed at a time there is a potential for optimizing the execution of the pipeline system. This generally requires multiple CPUs. Some load balancing may be intrinsic to the software outside of the pipeline such as in the operating system or use of parallizing compilers. Load balancing can also occur within the context of the executors or modules.
There are at least two distinct types of balancing. "Process balancing" selects where a module is to be run. "Data flow balancing" selects where the data is channeled.
Both executors and modules transform inputs to outputs. Therefore, it is possible for an executor to also be a module in a parent executor. When an executor is a module it is started by the parent executor and then exits when it is done transforming the input data. An alternative is if the pipeline executor is started separately and remains running. Then it is not a module but part of a network of pipeline executors.
There may be many pipeline executors which are running and waiting for data triggers. Therefore, the output data from one pipeline may provide the input data trigger for another pipeline. This suggest that multiple interacting pipelines can be thought of topologically as a kind of network. This means there can be many types of pipeline networks from a general web to more structured hierarchies and trees.
In principle all the modules and their trigger rules can be merged under a single executor. This is most clear on a single CPU organization but less obvious with multiple CPUs. So why use multiple executors? This is for modular design and ease of understanding and debugging. If the total pipeline system can be decomposed into sets of logical transformation then having one executor for each logical transformation allows modular development and verification.
We can discriminate various types of interactions between pipeline executors depending on whether the modules are endpoints or intermediate stages; though in a sense the idea of modules occuring in a linear sequence within an executor is more a logical and mental construct than a requirement.
There are two simple cases we can identify. One is where the final result of one pipeline executor triggers the initial stage of another executor. This is like a serial connection of executors. Another is when an intermediate stage in one executor triggers the input module of another and then the final output of the second executor returns to the next stage of the first executor. This second case is somewhat like thinking of the second executor as a module or subroutine in the first executor.
The second topology is not very interesting unless the data triggers are actually parallelizable. In this case the first executor can produces a set of data triggers, one for each of a set of independent executors (likely run on different machines). The next module in the first executor, or some other executor, would trigger only after all the other executors have produced outputs. This is basically a way to implement data parallel structure as part of a logical pipeline.
Figure 1 shows example connections in a pipeline network. The lines show the data flow. Lines entering a pipeline box on the left are input data triggers and on the right are final output. Lines on the bottom of the boxes are intermediate data within a pipeline which trigger other pipelines. This figure shows a master pipeline A which is triggered by some input. At some point data is generated which triggers pipeline B. Within B data is generated that triggers C, D, and E. This could represent a parallel step. The results of these return to B to trigger another stage in B. A later stage in B triggers F. The final output of B triggers G. The output of G returns to a stage in A.
We define a pipeline executor primitive as one which does not depend on another executor internally. Its input data and output data may come from and go to another pipeline but no internal module triggers or waits for another executor.
These are excutors composed of identical modules operating on pieces of a larger data set. These executors typically run in parallel and on different CPUs. They are also typically executor primitives.
We start by defining two basic executors.
The data flow would consist of a mosaic exposure being provided as the initial input trigger to an MEF executor. A module in this executor would trigger multiple identical SIF executors (ideally on different CPUs using a load balancing to assign the CPUs). At intermediate points and at the end of the SIF executor stages the results would return to the original (or possible different) MEF executor.
An intuitive heirarchical organization of executors and CPUs immediately come to mind. There is a master CPU running an MEF executor and other machines running SIF executors. However, what is a more versatile configuration, and simpler to configure and maintain, is to have every CPU run both an MEF and SIF executor which are identical across the machines. In normal operation it would likely be the case that all the raw MEF mosaic data would start at one MEF executor on one machine as in the heirachical picture. But it would also be possible to feed multiple MEF executors on different machines. This would eliminate a possible bottleneck or single point of failure in having a single master CPU.
In this approach there would be N machines running both MEF and SIF executors. The MEF executors would include load balancing to assign data parallel operations to SIF executors. On top of this system would then be another load balancer at the data feed point that could assign new raw mosaic data to the MEF executors on free machines.
So not only would this system be data parallel on a single mosaic exposure but would also be data parallel on a stream of mosaic exposures! Figure 2 shows a schematic of this system. It shows MEF parallelism by two and SIF parallelism by 2 all based on CPUs with identical software installations. The same set of CPUs could also be configured with one master MEF and five SIF executors just by adjusting the data flow balancers (a simple file in the IRAF networking method proposed below). Or the SIF executors on the same machine as the MEF executors could also be used.
There are many types of load balancing approaches and software. One that is particularly easy to use in the context of the mosaic pipeline executor network, where the atomic data chunks are fairly uniform in size and the CPUs are fairly uniform in speed, is data flow balancing. In this context, when an MEF executor separates the data into pieces to be executed independently on multiple SIF executors, the pieces are assigned to executors with the least amount of data pending.
For the type of executor where the triggers consist of data being placed in a directory (such as OPUS), the load balancing component in the executor or module looks at the number of files. From the pool of SIF trigger directories it assigns pieces to those with the least backlog. This indirectly optimizes for different CPU speeds since an SIF executor on a slower machines will soon have a higher backlog and no new data will be assigned until it processes some of its data.
This approach has the primary benefit that no specialized software is required. All a module in the MEF executor whose output data is mulitple data parallel pieces for various SIF executors has to do is look through a list of network directories (typically on different CPUs), determine the number of trigger files pending in those it can reach (to account for machines going down), and then assign the M pieces to the N directories in order of least backlog. N can be less than or greater than M and N can change during runtime.
A status monitor (described elsewhere) will eventually notice that any pieces that were pending have timed out due to a macnine crashing and the operator can restart things for another machine. If the computer crash doesn't destroy data, the alternative is that when the machine reboots and the executors are restarted they can continue on provided the executors are designed to be restartable from the point of the crash.
Strawman Detail
The first stage of the NOAO Mosaic Pipeline starts with an MEF executor. The point where this will break up the data for multile SIF executors is after crosstalk correction. The I/O can be optimized by having the crosstalk operation output the pieces directly to the trigger directories. This can be done using IRAF networking.
So the input to the crosstalk module is an MEF file in some directory. The list of output names is generated by looking through a list of potential CPUs and trigger directories and generating the output names as outlined previously with the IRAF nodename syntax. The normal output image open and I/O will then simply cause the data to appear in the SIF trigger directory. This eliminates the need for a separate MEF splitting and rcp-type of operation.
Steps:
mpl1!Opus/CCDpipe/ mpl2!Opus/CCDpipe/ mpl3!Opus/CCDpipe/ mpl4!Opus/CCDpipe/ mpl5!Opus/CCDpipe/ mpl6!Opus/CCDpipe/ etc.
Note that this can include the local node since the MEF pipeline might be any of the same set of machines. This also allows things to run on a single machine, albeit slowly.
mpl1!Opus/CCDpipe/obj012_im1.fits mpl2!Opus/CCDpipe/obj012_im2.fits mpl4!Opus/CCDpipe/obj012_im3.fits mpl1!Opus/CCDpipe/obj012_im4.fits mpl2!Opus/CCDpipe/obj012_im5.fits etc.
IRAF networking is as optimal as other networking methods such as NFS and PVM. In particular, it automatically spawns a server process on the machine that stays active and so there is no need to spawn a process every time an image is copied or the directory is queried.