Next: Availability of Copernicus UV Data in FITS Format
Previous: Storing and Distributing GONG Data
Table of Contents --- Search ---
PS reprint
N. Hill, S. Gaudet, D. Crabtree, D. Durand, J. Louie
Dominion Astrophysical Observatory, National Research Council of Canada, 5071 West Saanich Road Victoria, B.C. Canada V8X 4M6
The Canadian Astronomy Data Center (CADC) has been responsible for the Canada-France-Hawaii Telescope (CFHT) archive since 1992. The CFHT archive consists of a relational database containing the parameters for every exposure, and an optical disk library containing the data raw files. The CFHT archive now contains nearly 200000 science and calibration exposures from a variety of instruments.
The CADC needs to associate CFHT science exposures with calibration exposures (i.e., flats and biases) in order to do automated calibration for the generation of preview images, and to recommend calibration exposures to archive users. It is also necessary to identify the science exposures which use the same calibration exposures in order to streamline the calibration process.
A database query can be used to identify calibration exposures directly from the archive database, however the query is large, complicated, too slow to be used in practice, and the query must be tailored to each instrument, instrument mode and calibration exposure type. A method of efficiently and generically associating calibration exposures with selected science exposures was required.
Science exposures are associated with calibration exposures as shown in Figure 1.
Figure 1: Science exposure-calibration exposure relationship.
Figure 1: PS 4 Kb
The exposures in `science group A' should use the flats from `calibration group A' and the biases from `calibration group C'. The exposures in `science group B' should use the flats from `calibration group B' and the biases from `calibration group C'. The optimal sequence for processing the science exposures is:
We have developed a computer program which maintains database tables that allow simple and efficient identification of CFHT calibration exposures and groupings. The tables are shown in Tables 1, 2 and 3. The program assigns the same science group identifier (SGI) to every exposure in a science group, and the same calibration group identifier (CGI) to every exposure in a calibration group. The CGI's are created by converting all of the KP's to character strings, concatenating the character strings into a single string and applying a cyclic redundancy check (CRC) function to convert the string to a 4 byte binary value. The program takes advantage of the fact that given a science exposure, the CGI's can be calculated from subsets of the science exposure KP values without referring to the calibration exposures.
The SGI's are created by applying a CRC to the list of all calibration exposure numbers used by the science group. If the list of calibration exposures used by a science group changes, the SGI will also change. Since the date a science group was created is recorded in the tables, determining if it is necessary to recalibrate an exposure can be done by comparing the original calibration date with the creation date of the current best science group. The history of the science groups is saved, so it is possible to find and use `obsolete' science groups if the current `best' group doesn't work. We also plan to add usage counters to the tables to enable tracking of successful and unsuccessful uses of calibration groups.
A CRC is normally used for data correctness verification similar to a checksum. Unlike a checksum however a CRC incorporates positional information. For example the checksum of the string `FILTER 1:GRISM 2' is the same as the checksum of the string `FILTER 2:GRISM 1', while the CRC of these two strings is different. For this project, we are using the CRC function as a pseudo random number generator which takes a string of arbitrary length as its seed. The CRC will always generate the same value given the same seed string, and unlike a checksum a CRC cannot be easily spoofed into producing the same output for different input strings. For a complete description of the CRC algorithm see Campbell (1987), or the C source code is available from URL `ftp://netlib.att.com/netlib/crc'.
A CRC does not guarantee that every string will map into a distinct 32 bit value. It is possible that two different strings could map into the same group identifier. This is not considered a serious problem in the CFHT archive because it is unlikely that is has occurred more than a very small number of times, and when it does occur the calibration exposures selected will simply include inappropriate exposures (in addition to the appropriate exposures) which the calibration process will disregard. This problem could be further addressed by adding `collision' detection code to the algorithms, increasing the CRC size to 64 bits or adding some fixed qualifying key to the tables to reduce the chances for a collision (logical choice might be run identifier).
First all calibration exposures are processed and any necessary changes or additions are made to the cal_file table. Next each science exposure is processed, the CGI's used by the exposure are calculated from the KP values of the science exposure and the list of calibration exposures is retrieved from the cal_file table. The SGI is calculated from the list of calibration exposures and the cal_science table is updated as necessary. Each update to the cal_science table is accompanied by a check of the cal_group table to ensure the necessary entries are present. The programs are designed to only check new exposures unless the `-full' command line option is selected.
Using a CRC value as the group identifiers dramatically reduced the complexity of the calibration table maintenance algorithms. In exchange it was necessary to accept the possibility that inappropriate calibration exposures will be included with some recommended calibration exposures. This technique was very successful in solving this program and may have application in other areas of astronomy.