Next: MATADOR: Software for the Manipulation of 3D Data
Previous: Datastream Compression for IRAF Image Display
Table of Contents --- Search --- PS reprint


Astronomical Data Analysis Software and Systems V
ASP Conference Series, Vol. 101, 1996
George H. Jacoby and Jeannette Barnes, eds.

Systems Aspects of COBE Science Data Compression

I. Freedman

Consultant, 614 Sycamore Lane, S. 134, Davis, CA 95616

P. M. Farrelle

Optivision, Inc., 1480 Drew Avenue, S. 130, Davis, CA 95616

Abstract:

We present a fully programmable distributed system architecture based on data compression technology. We have developed a general approach to compression of diverse data from large scientific projects and this paper addresses the appropriate system and scientific constraints together with the algorithm development and test strategy. Algorithms which incorporate scientific knowledge and consume relatively few system resources are preferred over ad hoc methods. This framework has been implemented for the Cosmic Background Explorer (COBE) spacecraft by retrofitting the existing VAX-based data management system with high-performance compression software permitting random access to the data.

1. Introduction

The COBE satellite (Boggess 1992) carried three experiments designed to make high precision measurements of the diffuse celestial background. The detectors were stable and data sampling highly redundant. The observed sky was faint, low contrast and smoothly variable except for one instrument (DIRBE) which saw stars at fixed map coordinates. FIRAS and DIRBE reported glitches, many of which arose during passages over the South Atlantic Anomaly (SAA) region.

The processed cold data totals more than 380 GB with an effective expansion factor of (4--16) (Cheng 1992) over the raw data which depends on the instrument subsystem. The project standard data sets number about 1000 and may be classed as sky maps, time-ordered data and time-tagged data. These data sets combine science with engineering data. Data records were fixed length. The COBE Ground Segment Software System has been discussed elsewhere (Cheng 1992) and consisted of approximately 500 packages which process the data from raw telemetry to Project Data Sets. Initially designed for 30 GB static and 6 GB dynamic storage, the data volume grew swiftly to 380 GB dynamic storage with an expanded architecture of a 14-node VAXcluster for which workstations provided almost all the CPU power. With the advent of truly high-performance workstations, the I/O demands were also increased and disk serving became a critical load to all but the most powerful machines.

2. Data Compression Requirements

Interviews with Principal Investigators and Contract Leaders defined requirements as follows (Freedman, Boggess, & Seiler 1993):

3. Performance

The experimental data compression system outlined above yielded (22--90)% data compression with not more than 7% increase in wall-clock time for single node processing. The processing of compressed data was faster on average. We implemented logarithmic packing, zero suppression, run-length coding, Modified Huffman Codes, LZW, Rice codes and Chebyshev polynomial expansions. Planned development included Vector Quantization techniques. \

4. System Issues

Principal system issues included reduction of network load via reduced data transfer volume (compression), fewer I/O interrupts (increased record length) and reduced number of file locks (file combination). The new software architecture consumed additional virtual and real memory and slightly more CPU than the original. System tuning revealed several surprises which caused a limited rewrite of data access software.

5. Results

Table 1 shows the results of a quaternion Chebyshev expansion of attitude data followed by Modified Huffman encoding of coefficients. The target of 0.1^o attitude accuracy was maintained throughout. Table 2 shows the results of compressing a DIRBE daily file which contains spectrally-correlated photometry (which may be approximated especially if the residuals are stored in a randomly-accessible file on off-line or near-line media), search keys (time code and pixel address), slowly-varying angles (angle between DIRBE boresight and celestial objects), flags (indicating likely glitch observations taken within the SAA or radiation belts) and incompressibly noisy fields (pixel subposition). These composite data were compressed via syntax indicating the assignment of a compression method with parameters to offset ranges within a data record where multiple algorithms are used with a common restart interval (Freedman, Boggess, & Seiler 1993).

 

 

6. Issues related to FITS compression

The results of a brief survey via personal communication show that several major NASA projects are experimenting with or have adopted data compression techniques as follows:

To answer this need, we propose to generalize the existing Flexible Image Transport System (FITS) Standard to include the interchange of data in structured or compressed representation. Furthermore, to stimulate and encourage the development and usage of data compression methods in astronomy we propose to provide a public Data Compression Library that serves the astronomical community and may be added to as required. We created an object-oriented staged architecture supporting multiple algorithms implemented for the National Imagery Transmission Format. We see data compression/representation within the framework of approximation theory and think in terms of abstract data types with efficiently-supported transformations and operations (Samet 1990). Separating data into an on-line deeply compressed approximated component together with an off-line or near-line lightly compressed randomly-accessible residual preserves the original data exactly for future use. We intend to manipulate data in compressed/structured form by declaration of data class without additional programming overhead.

Major standards such as NITFS, JPEG, MPEG, DICOM and SDTS are not defined by software. An algorithmic definition of a minimal decoder is specified together with accuracy constraints. Any implementation whose output is decodable by the standard algorithm to prescribed or higher accuracy is conformant. We will continue this discussion on the USENET newsgroup sci.astro.fits and everyone is welcome to discuss these critical issues with us.

Acknowledgments:

The COBE data analysis is managed by the Goddard Space Flight Center for NASA’'s Office of Space Science and Applications.

References:

Boggess, N. W. 1992, ApJ, 397, 420

Cheng, E. 1992, in Astronomical Data Analysis Software and Systems I, ASP Conf. Ser., Vol. 25, eds. D. M. Worrall, C. Biemesderfer, & J. Barnes (San Francisco, ASP), p. 368

Freedman, I., Boggess, E., & Seiler, E. 1993, in 1993 Space and Earth Science Data Compression Workshop, ed. J. C. Tilton, NASA CP 3191, 85

Samet, H. 1990, The Design and Analysis of Spatial Data Structures (Addison-Wesley: New York 1990)


Next: MATADOR: Software for the Manipulation of 3D Data
Previous: Datastream Compression for IRAF Image Display
Table of Contents --- Search --- PS reprint
Wed Jul 3 07:40:47 MST 1996