Next: Transforming Images into Icons to Remotely Retrieve Information from Astronomical Archives
Previous: An On-Line Database of APS POSS Images
Table of Contents --- Search ---
PS reprint
Dennis R. Crabtree, Daniel Durand, Séverin Gaudet, Norman Hill
Canadian Astronomy Data Centre, Dominion Astrophysical Observatory,
Herzberg Institute of Astrophysics, National Research Council of Canada
Benoît Pirenne
Space Telescope-European Coordinating Facility
The CADC and ST-ECF now supports on-line access to the HST archive through a CD-ROM jukebox. In addition, we also support on-the-fly reprocessing of HST data when it is requested from the archive. This ensures that archival researchers can obtain data calibrated with the latest software and reference files.
Performing on-the-fly recalibration allows to keep only the raw data on-line and we further subset this by only selecting data for the science targets. Finally, the raw data is compressed using gzip before being written to CD-ROM. Taken together the complete set of public HST data, i.e., data obtained until November 1994, fits on approximately 31 CDs!
This paper will describe overall architecture of the archive including the Web access, maintenance of data on CD-ROM, maintenance of the reference data and details of the recalibration pipeline. Possible extensions for the future will also be discussed.
The CADC and ST-ECF maintain archives of data from the Hubble Space Telescope in cooperation with the Space Telescope Science Institute (STScI). Our two sites maintain a duplicate of the HST database which is updated over the Internet using Sybase's Replication Server. We receive a copy of the actual HST data on 12-inch Sony optical disk (WORM). The ST-ECF receives a copy of all the data in the archive while the CADC receives everything but the engineering data. However, unlike STScI, we cannot afford the jukeboxes needed to store the HST data on-line. The disks are stored in cabinets and all requests for data need to be handled manually by an operator. Also, the software we distributed to access the HST archive would only run on Sparc machines (SunOS and Solaris) which limited access to the archive.
Unlike the STScI Hubble archive, which needs to support the operational needs for the telescope, our focus is solely on archival researchers. This allows us to explore different options and mechanisms for providing access to the archive. We made three key observations while exploring the future of the HST archive at the CADC and ST-ECF.
We decided on the following goal for the CADC/ST-ECF HST archive:
Store all HST science observations on-line and provide users data calibrated at the time of the request using the latest calibration files and calibration S/W. Access to the HST archive will be Web-based in order to support the largest possible user-base.
The CADC version of the HST archive, which excludes engineering data, contains over 1 Tbyte of data. This includes raw and calibrated scientific data, calibration files, data used to produce the calibration files such as Earth flat observations, etc. Archival researchers are usually interested in a subset of this data. The typical archival researcher is interested in calibrated data for a set of science targets.
The data for such science targets is a subset of the data in the Hubble archive. While the end-user is interested in calibrated data, which is a further subset of the archive, an equivalent approach is to consider raw data in conjunction with automatic recalibration of the data.
A further reduction of the data can be realized through data compression. It turns out that the raw data in the archive compresses very well with either the standard UNIX compress utility or gzip. In fact, using gzip compression, the compressed raw data for ALL public HST observations of science targets fit on less than 30 CDs (as of Oct. 31, 1995). Each CD holds approximately 1000 datasets, 90% of which are either WFPC or WFPC2 observations.
The CDs containing the HST data are stored on-line in a CD jukebox. The data on the CDs are available as ordinary UNIX files which can be accessed using standard UNIX tools. Data is copied to a magnetic disk as it becomes public and then written to CD when enough data is accumulated. We expect the data rate from HST (prior to the 1997 servicing mission) will require approximately one CD per month.
HST observations are calibrated at the time of observation using the best available reference files and software available. This process is header driven in the sense that the steps to be performed, and the reference files and tables needed, are stored as header keywords. This data is provided to the Principal Investigator and stored in the archive.
However, when data is requested from the archive, which is at least one year later, there is a good chance that either better reference data is available or the calibration software has improved. Either one of these conditions implies that the user of the data should recalibrate the data. Fortunately, STScI stores information on which reference files should be used in the archive database. Thus it is straightforward to identify the best reference files and tables to use given a particular dataset. Also, the software needed to calibrate the data is distributed as part of STSDAS. Thus all the pieces are in place to automatically recalibrate the data when it is requested from the archive. The following excerpt from an FOS Information Bulletin is an example of the reasons for pursuing the goal of recalibration:
20 June 1995: New flat field reference files derived from post-COSTAR observations for the low dispersion FOS gratings and both detectors have been delivered to CDBS and installed in the PODPS pipeline. Low dispersion GO data obtained prior to this date have been reduced with flat fields appropriate to the pre-COSTAR epoch, which were (until now) the best available flat fields. All low dispersion (G160L, G650L, or PRISM) GO data obtained prior to this date should be re-processed with CALFOS and the new reference files...
Figure 1: Retrieval pipeline for HST data.
Figure 1: PS 139 Kb
There are three main pieces in the implementation of recalibration of data from the CADC/ST-ECF Hubble archives and these are illustrated in Figure 1. When a request for data is submitted (using our WWW interface) the necessary raw data is copied from the CDs stored on-line in the CD-ROM jukebox. Once on magnetic disk the data is uncompressed and read into STSDAS. Then for each observation, the database is queried for the best reference files. The output from this procedure is stored as a small cl script containing updates to the parameters for the reference files and calibration steps. The header of the raw data is then updated to reflect the new calibration file information. Finally, the data is calibrated using the pipeline software available in STSDAS. All of the current reference files and tables are stored on-line on magnetic disk, a total of approximately 7 GB. When the data is calibrated it is written out in FITS format and made available via anonymous ftp or written to tape if requested.
It takes approximately 3 minutes to fully process a WFPC2 dataset which are generally the most time consuming. At this rate we can process almost 500 datasets per day on a Sparc 10/51. During our testing of this process we recalibrated all public HRS data in approximately 2.5 days.
The CADC/ST-ECF Hubble archives can be accessed via the World Wide Web at
either
http://cadcwww.dao.nrc.ca or
http://arch-http.hq.eso.org/ESO-ECF-Archive.html.
The WWW access provides complete access to the Hubble Science Archive including access to preview images in GIF format. The Web interface also provides easy access to the CFHT and ESO archives, SIMBAD and the Digitized Sky Survey. The Web truly does allow varying services to be effectively linked together!
Possible future enhancements to the system include:
We are grateful to the Canadian Space Agency for the continued support of the Hubble archive at the CADC.