Next: An Ultra-Low Bandwidth Video Transmission System
Previous: Implementation of an Optical Prescription Retrieval Code Using PVM (Parallel Virtual Machine) in a Mixed Architecture Network
Table of Contents --- Search ---
PS reprint
Clive G. Page
Department of Physics & Astronomy, Leicester University,
Leicester, LE1 7RH, U. K.
The FITS table provides a self-describing, portable, and fairly efficient format for the storage of heterogeneous data sets. These advantages are now widely recognized and FITS is being adopted by many projects as the standard for tabular data. The FITS binary table is now fully standardized (Cotton, Tody, & Pence 1995) and is likely to supersede the ASCII table for all applications since it provides more efficient storage and has a wider range of data types.
Among the largest FITS tables are those from the photon-event lists generated by X-ray telescopes such as those on ROSAT and ASCA. Large tables are also used to store catalogs of celestial sources such as those from the Astronomical Data Center and from the Centre de Donnée astronomiques de Strasbourg. These sometimes reach tens or hundreds of megabytes in size; it is not always easy to discover exactly what such tables contain.
A collection of utilities for manipulating FITS files, called FTOOLS, is available from the High Energy Astrophysics Science Archive Research Center at GSFC. These are modular and well-documented programs which support a number of useful operations on FITS tables. The FTOOLS are, however, essentially batch-oriented and include no facilities for browsing or editing tables, nor for maintaining context from one tool to another. They normally generate a new file as a result of each operation, which can impose a significant I/O load.
The user sometimes just wants to browse through a FITS table interactively, or perhaps to make a few minor changes to the contents. Since few if any existing packages seem to provide facilities of this sort, IDA1 was invented to fill the gap.
IDA makes it possible to view a FITS table (binary or ASCII) as if it were a spread-sheet. Some 20 rows of data may be displayed on the terminal, with as many columns as will fit across the screen. The arrow keys (and PAGE UP and PAGE DOWN) may be used to scroll through the table in all directions. One can also browse through the FITS keywords in the header in a similar way.
Sometimes one wants to make a minor alteration to the data in a FITS table, e.g., to update a few values, or add some more rows to a table. By default IDA opens a file in read-only mode, but if a file is explicitly opened for update then edits are possible, and are carried out by simply typing a new value into the appropriate cell. It will, of course, usually be prudent to make a copy of a file before modifying it, but this is up to the user.
Tables often have too many columns to be shown in full, even when the optional 132-column width is selected, and scrolling sideways is not always the best solution. It is therefore possible to select which columns are to be shown, and in which order. It is also possible to display virtual columns, i.e., ones which are computed on-the-fly by evaluating an expression based on the values in other columns.
FITS tables often contain many thousands or even millions of rows, in which case it is not feasible to browse through more than a tiny part of a table. A method of selecting those rows of particular interest was therefore essential.
The select command specifies a logical expression involving column names and constants which selects a subset of rows for display or further manipulation. As well as the usual arithmetic operators the selection can include Fortran 90 style relational operators, and a special from- to syntax which simplifies inclusive range selections. For example:
select vmag >= 12.5 and sptype from "A5" to "A9"In order to avoid I/O overheads, the selected rows are not written to another file but merely marked as the subset of current interest. Further selections operate by default on the latest subset, allowing the user to home in gradually on the area.
When exploring data in a large table, it is often useful to present them in order (ascending or descending) of one of the columns. It is therefore possible to sort the current subset (or the entire table) on any column. The save command allows a sorted table (or indeed any subset) to be turned into a new permanent FITS file.
A number of other commands are provided. One can list the contents the table (or any subset) producing an external text file, or project any selection of columns (real or virtual) to a new FITS table. One can also compute statistics of the columns in the current subset. The update command allows all the values of a column in the subset to be recomputed. In addition it is possible to join two tables on any column (either on exact equality or, for numerical columns, for equality within a given tolerance). IDA also supports a cjoin command to join two tables of celestial positions using a great-circle distance function. Both commands have an option to allow an outer-join so that the new table also includes the unmatched objects.
Null values (i.e., ones missing or unknown) are fully supported in all operations by means of three-valued logic (true/false/unknown). Null values are shown as `` ?'' in text listings, and can be entered interactively in the same way when browsing spread-sheet style. This can be a good way of removing invalid points from a data-set.
Angles can be displayed in sexagesimal formats, i.e., hours-minutes-seconds (or degrees-arcminutes-arcseconds), provided the units are given as degrees or radians. It is unfortunate that the FITS Standard does provide a specific format descriptor for sexagesimal notation, as it continues to be popular with astronomers.
In the time of a typical disk seek (
10 ms) a modern processor
can execute a million instructions, so a suitable disk cache
can produce valuable savings. The I/O efficiency of IDA is
assisted by maintaining in memory a cache of FITS records (each 2880
bytes long) so that repeated retrievals of the same record, as when
reprocessing a subset, avoids physical I/O as much as possible.
Help text (retrieved by the help command) comes from a LaTeX file, so that the same text can be used both for the on-line help and the printed reference manual. The LaTeX source-text is converted at run-time into a tree-structure and held in a direct-access scratch file.
The operation of the select command is fast enough on a small table or subset, but can be slow when selecting from a table of more than a few megabytes in size, since it is necessary to evaluate the selection expression on each row in the table. For the larger FITS tables some form of keyed access is essential. IDA supports this in two alternative ways. Firstly a complete table may be sorted (in ascending or descending order) on the column of interest. This is useful in the simpler cases, but if more than one column is of interest, or the data are not completely static, it is better to create an external index to each column of interest. IDA supports the creation and use of indexes bases on the commonly-used B+ tree. These B-trees are fully dynamic, and are updated automatically whenever the corresponding value in the table is modified. The user only needs to switch between one index and another to see the data instantly displayed in a different order, which can be very helpful in data exploration.
The selection of subsets by keyed access is supported by a special find command. The join and cjoin operations also rely on indexes, which are created automatically if they do not already exist.
A separate cache of B-tree index blocks is maintained. In general the root node and other nodes high in the tree will be repeatedly accessed and will tend to remain in memory, so that the retrieval of a randomly-selected item will often only involve two disk accesses, one to the leaf node of the B-tree, and one to the required record in the FITS file.
IDA only needs a VT-100 display or equivalent (such as a PC), and requires no external libraries. As soon as the current testing phase is complete, it is intended to make the IDA source-code available by anonymous ftp. Some further developments are planned, e.g., to support additional database operations such as grouping, and facilities for graphical output.
The code is written almost wholly in Fortran 77 with minimal extensions. It is already working on Sun-Solaris, Alpha-Digital UNIX, SGI-IRIX, PC-Microsoft Fortran, and PC-Linux-f2c. VAX-VMS and PC-Linux-g77 versions are almost complete. FITS formats are big-endian: the UNIX version of IDA achieves endian-independence by detecting when it is run on a little-endian platform and swapping the bytes automatically where necessary.
Potential users should be aware that an alternative FITS table browser with a GUI-style interface is now available in the form of the CURSA package written by A. C. Davenhall of the UK's Starlink project. It uses Tcl/Tk and does, of course, require an X-terminal or workstation. IDA and CURSA share many design aims and have some code in common, but at present CURSA has no editing facilities and requires sorted tables for efficient access.
1Ida is Minor Planet 243, the first one known to have a satellite of its own.