Next: AIPS++ and the GBT
Previous: Lessons Learned: The Object-Oriented Design of StarView
Table of Contents --- Search --- PS reprint


Astronomical Data Analysis Software and Systems V
ASP Conference Series, Vol. 101, 1996
George H. Jacoby and Jeannette Barnes, eds.

A FITS Image Extension Kernel for IRAF

Nelson Zarate, Perry Greenfield

Space Telescope Science Institute, 3700 San Martin Dr., Baltimore, MD 21218

Abstract:

STScI has adopted FITS with Image Extensions as the native disk and archive format for the two new instruments being installed into HST during the planned servicing mission in 1997. In order to use these data conveniently within IRAF, STScI has implemented an IRAF image kernel that can access FITS with image extensions as a native image format; eventually this prototype kernel will be adapted for inclusion into IRAF 2.11 by NOAO. This paper describes the advantages of this format and the design issues faced during its implementation.

1. What are FITS Image Extensions?

The FITS standard (see NASA 1995) allows the appending of any number of optional Standard Extensions to a simple FITS file which consists of the usual Header and Data Unit (HDU). These Standard Extensions also consist of Header and Data Units. These Standard Extensions may be of different types; the type is specified by the value of the XTENSION keyword in the extension header (e.g., IMAGE or TABLE). When a FITS file contains such extensions, the first HDU is referred to as the Primary HDU and the following HDUs as Extension HDUs.

By using extensions of type IMAGE, it is possible to combine multiple images in one file (along with table extensions for that matter). Different extension headers can have different keyword sets and the different extension data arrays may be of different dimensionalities, sizes, and data types. In this respect it is much more general than the ``group'' type format employed by current HST data products. These GEIS (Generic Edited Information Set) files allow multiple images, but these images share the same keyword values found in the header and must all be of the same dimensionality, size and data type. Group Parameters are used to specify keywords whose value change between images; however, these Group Parameters must exist for all the images and cannot be dynamically added to a given image without changing the structure of the whole file.

2. STScI Motivations for Using FITS with Image Extensions as a Disk Format

2.1. Portability of Data

The currently used GEIS files have embedded machine dependencies which complicate the use of such files on different machines since conversion programs are needed to move the files from one type of computer to another (e.g., VAX and Sun). FITS files, on the other hand, have machine independent data formats. Because the FITS format is widely used, its use results in data analysis system interoperability also.

2.2. Used as an Archival Format

The acceptance of FITS as an archival format means that no data format conversion is necessary between HST archives and HST data processing systems.

2.3. Data Grouping

The ground-based software being developed for the two new instruments to be installed in HST during Servicing Mission in 1997, STIS and NICMOS, will generate images and spectra with much associated data. Rather than deluge the observer with a huge number of files, it was agreed that packaging related data products together in one file would simplify the management of the data products for the observer.

3. Design Considerations for the Image Kernel

3.1. Efficiency

The FITS file structure is inherently sequential in design. Successive accesses of a FITS Image Extensions file that contains large numbers of extensions will result in a significant performance penalty because of the repeated scans through all the headers prior to the referenced extension. To avoid this the kernel constructs an internal directory of the extensions with pointers to the offsets for the headers and data units. The current version of the kernel fills the index entries for all headers examined so that subsequent accesses to any of those headers can determine their offset immediately, and any access of a following header not yet indexed can begin at the last header indexed. The only performance penalty incurred is when files are accessed out of order, and in the worst case the total penalty is the time it takes to scan all the headers once, regardless of how many extensions are accessed.

The current implementation caches the previous FITS files index tables so that different IRAF image opens (even from different tasks in the same executable) can access the index table created by previous image open calls to different extensions (logically treated by IRAF as a separate file). The index tables are identified by the full IRAF filename. It is possible that a previously cached index table may be rendered invalid by a user replacing a file with a different one outside of the IRAF environment (or otherwise modifying the file). For that reason, the kernel always verifies the modification time for the current file to be consistent with the corresponding cache entry.

3.2. Convenience

Asking a user to access extensions in a FITS file solely by extension number is asking for frustration and trouble, particularly in the case where there are repeated sets of extensions. A mechanism for identifying extensions by name is crucial. Two standard FITS keywords for extensions serve this purpose nicely. EXTNAME and EXTVER are used in combination to identify extensions. EXTNAME takes a string value whereas EXTVER takes an integer value. One can give all the image extensions different values of EXTNAME and thus not rely on the value of EXTVER, or give some image extensions a common name to represent an image type (e.g., SCI) and use EXTVER as an effective array index (e.g., SCI,3 for third SCI image). Even though EXTVER can be used as an array index, the values of EXTVER need not reflect the order in the file or form a complete sequence of integers. Use of EXTNAME and EXTVER is constrained to the extensions only and cannot be used to identify the Primary HDU. The current FITS kernel can use either extension names or numbers to access extensions.

The kernel will by default try to prevent the user from creating duplicate extension names in the same FITS file to prevent confusion. But it will not assume that all extension names are necessarily different.

Always requiring users to specify extensions as part of the filename specification, especially when there is an array or image of primary interest in the file, is bound to make the data format an unpopular one. For this reason, when the user specifies a filename only, without an extension specification, the kernel will presume the first non-null image in the file is being referred to (the Primary Data Unit if it exists, otherwise, the first non-null image extension).

3.3. Inheritance of PHU keywords

The relationship of the keywords in the Primary Header Unit (PHU) to the Extension Header Units (EHU) is nowhere defined in the FITS standard. Many treat the headers as independent (and most FITS readers adopt this interpretation), whereas some have presumed that the PHU keywords apply to all the extensions. It is natural to assume that the extensions grouped together in the same FITS file should have a ``global'' set of keywords that apply to all, and hence that the PHU keywords could serve that role. But allowing such a mechanism introduces many potential complexities. This single aspect of the kernel produced by far the most discussion and controversy of any aspect of the kernel design.

After much discussion (within STScI, with NOAO, and with the FITS community in general via sci.astro.fits) it was decided to adopt an ``inheritance'' mechanism for keywords using the following rules:

  1. By default, inheritance of the PHU keywords can only occur when the PDU is null. (This alleviates the potential for confusion of data specific header keywords such as BSCALE since they should not appear in a PHU associated with a null PDU.)
  2. An extension inherits the PHU by default only when its header contains a keyword,value combination of INHERIT=T.
  3. FITS-required keywords are not inherited.
  4. Keywords in the PHU that appear as well in the EHU will take their value from the EHU when inheritance applies.
  5. Commentary keywords (COMMENT, HISTORY, ``blank'') will never be inherited.
  6. Changes made to an inherited keyword only appear in the output EHU, not PHU.
  7. After the inheritance operation is performed, by default INHERT=F is set in the writing of an inherited image header (i.e., inheritance turns itself off on propagation).

Rules 1, 2, and 7 can be overridden by use of an option in the image name specification.

Further limitations of the inheritance feature should be noted. If the purpose of using inheritance is to prevent needless repetition of keywords in EHUs it will effectively be negated by the propagation of the EHDU through a single use of standard IRAF program or utility that generates a new image file from the input image file. The ``global'' header and extension header distinction can only be preserved by software that takes special care to disable the automatic inheritance and propagate the headers separately.

3.4. IRAF Applications Compatibility

Most of the IRAF applications that handle images through the IMIO interface can access and create FITS files and FITS extension with the important point that each IRAF task treats an input file specification as a connection to only one image. It is not possible for an existing general image application to manipulate all of the IMAGE extensions with one file specification; there needs to be one input file specification per image extension. All FITS image data types can be read; all but byte format can be written.

3.5. Future Enhancements

The IRAF image model is basically a single image model. There will be some aspects of dealing with FITS Image Extensions that will either be quite tedious because of the repeated operations necessary to apply some operations to each extension of a file when in principle it could be done all at once (e.g., imcopy), or will not behave as expected (e.g., imdelete). It certainly will be necessary to develop utilities or modify existing utilities to make using FITS Image Extensions more convenient.

References:

NASA 1995, Definition of the Flexible Image Transport System (FITS) v1.1, NASA/Science Office of Standards and Technology, Goddard Space Flight Center


Next: AIPS++ and the GBT
Previous: Lessons Learned: The Object-Oriented Design of StarView
Table of Contents --- Search --- PS reprint
Wed Jul 3 08:22:35 MST 1996