Next: Parsley: a Command-Line Parser for Astronomical Applications
Previous: ETOOLS: Tools for Photon Event Data
Table of Contents --- Search ---
PS reprint
D. J. Allan
Department of Physics and Space Research, University of Birmingham,
Edgbaston, Birmingham, B15 2TT. United Kingdom
Despite exponential growth in available computing power (with a doubling time of order 18 months), many of the jobs for which we use our machines are still cpu limited. There are two easy ways of increasing your usable cpu resource; buy a faster machine or make more effective use of existing cpu machines. The latter is especially desirable considering the low duty cycle of your average astronomical workstation (often 10% to 20%) and so it was decided to write a library to allow applications to do this.
The following factors were considered of prime importance in developing DPL.
This combination of requirements led to the selection of the IMP library (written by Keith Shortridge at the AAO) for the interprocess communication.
Using the DPL system the cpu-hungry monolithic application is replaced by a DPL client. This program breaks up the high-level computation wanted by the user into sub-units or jobs which can be executed in parallel. Jobs are submitted to job queues which are stored in internal client data structures. Jobs are executed by worker processes which reside on the workstations being used by the client. An additional process, called the DPL server runs on each workstation. This buffers messages between the client and its workers.
When a client starts up it declares its intent to use a named virtual machine (VM). A VM is just a particular configuration of workers spread over a specified group of machines. Configurations are store in a plain text file. The client creates a server process on each workstation which in turn creates the worker processes.
Job queues can be of two kinds, symmetric or sequential. In symmetric queues each job submitted is executed on every worker in the virtual machine. Such queues are useful for performing data loading or resetting operations. Sequential queues are the means by which useful parallel computation is performed. Each job submitted to such a queue executes on the next available worker.
Data is returned to the client by specifying an optional callback routine when the job is submitted. The routine is called back with the data supplied to the worker, the data returned by the worker and optionally data supplied when the job was submitted (i.e., data which is required to process the worker results, but which is not required by the worker).
Data can be passed in 3 different ways between client and workers. In the first the data passed is simply an array of bytes. In the second the data is an array of character strings. The third option requires linkage of DPL against the ADI library (Allan 1995), and allows arbitrary hierarchically defined data structures to be passed to and from workers. This is the means by which DPL can pass data whose storage is architecture dependent.
The first target for the DPL library was the 3-D X-ray cluster fitting code developed at the University of Birmingham. This fits models consisting of tens of parameters to multiple datasets each containing tens of thousands of data points with error and quality information. While the size of the datasets is not large by optical standards, the convolution of the model space dataset into the detector space is very time consuming, the spatial and spectral convolutions being respectively spectrally and spatially dependent. Each data/model comparison takes 100 to 2000 milliseconds on a 200 SpecMark class machine, and a single error run up to 1 day of cpu time.
This cluster code made an attractive target for DPL for a number of reasons.
Preliminary results using DPL are promising. On a dual processor Solaris server the null job time (the elapsed time between a client submitting a job and receiving the result, for a job which does nothing) is 2--3 milliseconds. For workers on other cpus the times range from 10 to 50 milliseconds. So, even restricting DPL to the same workstation as the client we achieve nearly a doubling in throughput assuming each of our worker processes gets allocated a different cpu. The total cpu resource available to DPL at Birmingham is about 1200 SpecMark, a factor of 5 increase in our fastest single available workstation.
Several aspects of DPL need addressing before the software can be used by non-expert users.
The DPL library supports coarse grained parallel processing on heterogeneous collections of UNIX workstations (currently those running OSF, Solaris, SunOS and Linux). The minimum unit of computation which can be parallized efficiently is about 5 milliseconds on a multi-processor machine, or about 50 milliseconds on Ethernet connections.
The only external software required to build DPL is the the AAO IMP library and its associated error system. All the source code can be compiled using the gcc compiler.