Next: Electronic Publishing at the American Astronomical Society
Previous: Starbase: A User Centered Database for Astronomy
Table of Contents --- Search ---
PS reprint
A. H. Rots
Universities Space Research Association,
XTE Guest Observer Facility,
Code 660.2,
Goddard Space Flight Center,
Greenbelt, MD 20771
s time resolution and moderate spectral resolution.
A third instrument (ASM) will monitor most of the X-ray sky every 90
minutes in the 2--10 keV range.
XTE's on-board science data systems provide considerable processing power and unprecedented flexibility in telemetry data modes. Events are processed on-board in several simultaneous data modes, chosen from a large repertoire. New data modes may be added during the mission. Consequently, keeping track of the collected data in the database and providing a mechanism to select data that satisfy selection criteria expressed in physical terms is a challenging problem.
The XTE Guest Observer Facility will provide the data in FITS format. Its top level requirements are that all XTE telemetry must be converted to and archived in FITS files; and that all XTE data must be retrievable from these FITS files. The design goals are: a clear and functional hierarchical structure; exclusive use of FITS binary tables; easy navigation; and a scalable design.
The FITS standard provides for the exchange of tables, as well as a table hierarchy (through the EXTLEVEL keyword). We can take advantage of this by using index tables to keep track of the data tables. In other words, we can build a database out of a hierarchy of FITS tables by implementing ``tables of tables'', a concept borrowed from the AIPS++ Table Classes. Table references consist of UNIX-style relative path names to FITS files.
Our basic design contains a three level hierarchy: Master Index, Subsystem Index, and Data Table. The following table outlines this structure. In all tables, the ``vertical'' (or ``row'') axis is time (or observation---which amounts to the same), while the ``horizontal'' (or ``column'') axis distinguishes spacecraft subsystem units, or data sources.
Starting at the top of the hierarchy, each row in the Master Index contains, for a single observation, references to all the Subsystem Indices that, in turn, contain references to Data Tables belonging, or pertaining, to that observation. There are Subsystem Index tables for the science instruments, for the spacecraft attitude control system, for the clock corrections, for the orbit ephemerides, for the system of calibration files, etc. In addition, the Master Index contains columns with observational parameters, such as Observation ID, start and stop times, and source information.
The Subsystem Index tables contain rows that correspond to segments of the observation during which all telemetry data for that subsystem were deposited in the same set of Data Tables. Each row contains references to those Data Tables, as well as data mode and configuration information when relevant.
The Data Tables will be described in Section 3.
The table hierarchy is reflected in the structure of the directory tree in which the database is stored. Extracting a sub-database (e.g., all observations for a particular proposal) not only is a simple operation, but also yields a new database that has an architecture identical to that of the original one. It involves lifting the relevant rows out of the Master Index, depositing them into a new Master Index table, and copying all the Subsystem Index and Data Tables directly and indirectly referenced in those Master Index rows. Consequently, the system can also function in the user's home environment. Such ``stripped'' Master Index tables are automatically provided for each observation and for each proposal, so that databases containing these can simply be extracted using the UNIX tar utility.
Tables: The items contained in a single table will represent all data from one Application (CCSDS jargon for a physical telemetry data source). The parallel science data streams referred to above will each come down from separate Applications, as will the ``Housekeeping'' (monitoring) data from each of the detectors. The time span covered by a table is determined by events that force the start of a new table: the start of a new observation; a change in the telemetry format of the Application; or a reset (reboot) of the Application.
Rows: Each row will contain the contents of one telemetry packet or logical group of packets and will thus consist of all the data that an Application sends down for a given time stamp when it flushes its buffer. The associated time stamp is made part of the row.
Columns: Data from different data sources in the Application (such as detectors or layers) will be separated into different columns. The data from a single data source, in a single table cell, may consist of arrays with the following axes: energy, wire position, time, frequency, phase, or lag.
Data integrity is guarded in two ways. All FITS files are properly checksummed, using the proposed FITS checksum convention; hence, the 32-bit, one's complement checksums should all be - 0. In addition, for each data file, a 160-bit message digest is calculated following the Secure Hash Standard defined in FIPS 180-1.
As indicated in Section 1., we need a mechanism that can keep track of the contents of each stream and that will accommodate future, unknown data modes. For this purpose we have developed the Data Description Language (DDL) which has two main functions: data identification and data selection. Its design is suitable for multi-mission applications.
The DDL provides a tag for each data object or table cell that identifies the contents unambiguously in terms of:
Data descriptors are built from a number of different tokens, separated by logical operators. Each token can be thought of as representing a coordinate axis in data space. The argument of each token specifies its value. The values are, in most cases, integer numbers but individual bits can be addressed by ``name'', if appropriate. The usual bit-wise logical operators may be applied.
Given a set of selection constraints, such as selected source(s) and time range(s), and a data descriptor, it becomes fairly simple to navigate through the system and find the data items one is looking for. The Master Index acts as an observation catalog with references to Subsystem Indices, while the latter contain configuration information and references to the Data Tables. Certain data descriptor tokens translate directly into database navigation directives. Beyond that, it becomes a matter of matching data descriptors in more detail.
There are four levels of implementation for the data descriptor matching. The simplest is to search for a literal match of data descriptors. The next level is an equivalent match: two data descriptors may have the same meaning (e.g., through the use of wildcard characters). An inclusive match returns a collection of data items that, together, contain the information requested in the data descriptor (and, possibly, more). The most sophisticated implementation is capable of an intelligent match and transformation: the retrieval system will collect the necessary data and transform it to conform with the data descriptor.
Note that the scalability of the database design ensures that the system will also function in the user's home environment. An XTE Data Finder (XDF) has been written as an itcl script with some C code. It allows the user to navigate the database interactively, using the Master Index and Subsystem Indices.
Although XTE has not been launched at the time of this writing, the XFDB has been used for processing mission simulation data and, therefore, represents a proven environment. It shows that it is possible to design a database with the following properties:
Finally, most of the software developed to implement this database is not mission-specific. For instance, the detailed structuring of individual FITS tables is done through automated scripts, while the database creation process is driven by these structure descriptions. We see, therefore, realistic possibilities for application to other space missions as well as at ground-based observatories.
I am greatly indebted to Randall D. Barnette and Kerry C. Hilldrup of Hughes STX for their many contributions to this project.