NOAO Data Products Program

Incorporating Spectra in the Next Phase of the Virtual Observatory

Francisco Valdes
April 28, 2003


For the most part astronomical images and spectra are both projections of a four dimensional observational parameter space. The four parameters are two celestial coordinates, photon energy, and time. The suggestion presented here is that accessing these common forms of astronomical observational data should be based on this four dimensional parameter space. The prototype for this would be a fairly direct extension of the current "Simple Image Access Prototype" (SIAP) specification from the current two dimensional parameter model.


The two most common types of astronomical observations are images and spectra. Because spectroscopic instrumentation is generally different from imaging, though many spectrometers/spectrographs include imaging modes, an artificial distinction is made between images and spectra. Also the way spectral information is sometimes obtained by multiplexing photon energies into spatial positions on a detector confuses the issues. These factors often lead to separate treatment for the two.

Conceptually, a majority of astronomical observational data consist of measurements of photons arriving from a particular direction on the sky, with a particular energy, and at a particular time. Sometimes this information is recorded directly in so-called event lists. Other times the events are binned to produce an array or raster which is implicitly or explicitly four dimensional.

This picture of astronomical observations leads us to consider this class of data as defined by the four parameters of celestial position, energy, and time. Note we refer to the spectral information in terms of photon energy though wavelength or frequency could also be used as appropriate. The discussion which follows does not expand on the time aspect of the parameter space. So the approach described here could also be defined to consider a three dimensional space without the time element. Time was included, however, because it is a clearly identifiable aspect of the observation model.

My vision for the VO access layer to observational astronomical data is that the instrumental signatures and characteristics are removed by the provider apart from the resolution or binning. This is an important requirement for dealing with spectra since the raw instrumental data can be in quite complex formats with spatial and spectral information multiplexed onto a detector.

The question addressed here is whether spectral data can be easily incorporated in the current VO developmental framework. In particular, whether the "Simple Image Access Prototype Specification" (SIAP) might be extended or if another prototype is needed for spectra. In order to consider a modest extension of SIAP, which is about raster data, the observational data is also reqiured to be binned in the four dimensional parameter space. Event data can be accessed through such a model by requiring the data provider bin the data for VO access through such a raster protocol.

While we talk about a raster this does not mean the sampling is uniform in any physical units. What is meant by a raster is that a set of photon values (such as flux or counts) is provided with a logical index. Conversion from the logical index to a physical four dimensional parameter space coordinate is the provence of the world coordinate system (WCS) and of the discovery metadata.

Considerable thought has gone into expressing the relationship between logical indices and world coordinates in the FITS WCS methods. An important recent development is the proposal to allow lookup tables as part of the relationship. This is significant for spectra, particularly 1D projections, because the energy coordinates are sometimes provided in a lookup table. In other words, a common form of spectral data held by data providers is a table of photon fluxes with associated energy.

While an lookup table was conceived of for spectra the concept can be applied more broadly to the four parameter raster. What this allows is sparse sampling from the raster. This might be relevant to some types of spectral data where the spatial sampling is sparse and somewhat random. An example of this is multiobject spectroscopy where sources are targeted with fibers. Whether the instrumentally extracted spectra should be considered separate rasters for the purposes of VO access is an interesting point of discussion.

Data Access, Data Models, and Data Formats

A key distinction that needs to be reiterated in discussing data within the virtual observatory context, is between data access/requests, data models, and data formats. We raise this here with regard to spectra because it is easy to end up mixing all three. The discussion here is focused on data query and access for spectra. Discovering and requesting data is largely independent of the data format which is ultimately provided as the result of a request.

In the context of SIAP, that specification mandates certain types of data formats for retrieval. The primary science type is FITS so in terms of considering an extension of SIAP for spectra this implies a FITS data format. There are a variety of ways spectra can be included in FITS. This is the subject of the discussion by Busko and I can provide a similar proposal for general spectral formats. A discussion of the best few formats for spectra might be diverse at first but I believe it would not be hard to converge on a few that are FITS based and general enough; keeping in mind that the vision is access to instrument independent science spectra and not complex multiplexed data acquisition formats.

A key factor for the science formats is that they include a WCS. The FITS WCS, including lookup tables, has been developed to the point that it provides fairly complete descriptions for images and spectra. Note that the non-linear distortion feature is still a proposal by Valdes and others.

Projections of 4D Parameter Space -- Images and Spectra

This section identifies the obvious projections that constitute observational images and spectra. The first assertion is that for such observational data there is one time value corresponding to a representative instant in the observation (the start or midpoint). The second assertion is that an image is a spectrum with one energy point. Both the time and energy points have metadata to define the point such as exposure time and filter bandpass.

Spectra come in several flavors. First, by definition, these have more than one sample in energy. The most complete spectral type is the so-called data cube. Data cubes include multiple raster elements in both celestial coordinates and in energy. These are generated by radio spectrometers as well as Fabry-Perot and Integral Field Units at higher energies.

Slit spectroscopy has been the mainstay of optical astronomy. These are 2D rasters with one celestial and one energy dimension. The celestial dimension requires a higher dimensional WCS in the metadata to convert the spatial logical index to a curve in two dimensional celestial space. There are FITS WCS proposals for how this can be done in a general fashion.

Finally, fiber or spatially integrated spectra have just one point in the spatial parameters.

Extensions of the SIAP Specification

The conclusion of the discusion presented here is that (raster indexed) spectral data should be incorporated into the developing VO infrastructure by avoiding any artificial distinction between images and spectra. Therefore, one should extend the SIAP specification rather than invent a new mechanism for spectra. The concern about having the word "image" in SIAP is recognized but not discussed here.

This discussion is not intended as a proposal but simply to explore how spectra might be incorporated within the SIAP specification. Many details would have to be worked out.

In a broad review of the SIAP specification it appears that the main changes required are to restate the purpose to include a more general concept of "image" as potentially one to four dimensional data formats and to extend the query and metadata fields appropriately. In particular, the discussion of queries would be expanded to a search for data in a given region of the sky, over a region of photon energies, and over a period of time. Then the region of interest (ROI) specified by the POS and SIZE fields would be extended to include four values rather than two. As details there might be special values or definitions about the interpretation of missing fields.

The main question to be resolved is whether and how the query syntax can select only images and spectra in the usually understood sense. One way might be specifying the number of elements along the energy axis; i.e. a value of 1 is an image. But probably a better way to make this common distinction would be a parameter similar to INTERSECT. A parameter such as TYPE with values "IMAGE", "SPECTRUM", or "ANY" would place certain requirements on the requested data content. IMAGE would have more than one element along each spatial dimension and only one element of energy and time. A spectrum would be data with multiple samples in energy. Other choices might restrict the request to 1D, 2D, or 3D spectral data.

Relationship with the SAO/CfA SIAP Proposal

The ideas presented here are similar in many respects to SIAP Extension RFC and Draft Specification, Part 1: Quantities and Coordinates by Steve Lowe. The main difference is that the Lowe paper attempts to be more general. The approach suggested in this 4D discussion is intermediate. Images and spectra are treated as projections of an extended parameter space which can be handled by an extension of the SIAP methodology. However, the 4D proposal is to not make the extension too general, and hence, complex. Instead, simply add two parameters to cover the vast majority of observational parameter space of interest to astronomers. Furthermore, adopt something like the degree limitation on units for celestial coordinates to restrict the energy and time specifications. Transformation between units would be done by the client interface and by the data provider.


This discussion suggests that observational images and spectra be treated as aspects of a four dimensional observation model. This assumes the raw instrumental observations have been converted to raster sampling along 1 to 4 axes so that spatial multiplexing or other quirks of raw spectral data are eliminated. Such data would be accessed by a fairly simple extension of SIAP to a four dimensional parameter space. There might be a new paraemter to allow restricting requests to "images" or "spectra" separately as well as getting information and data about holdings which include both images and spectra in some (4D) region of interest.