Next: Storing and Distributing GONG Data
Previous: The CADC/ST-ECF Archives of HST Data: Less is More
Table of Contents --- Search --- PS reprint


Astronomical Data Analysis Software and Systems V
ASP Conference Series, Vol. 101, 1996
George H. Jacoby and Jeannette Barnes, eds.

Transforming Images into Icons to Remotely Retrieve Information from Astronomical Archives

André Csillaghy1

Institute of Astronomy ETH-Zentrum, CH-8092 Zurich, Switzerland

1http://mimas.ethz.ch/people/csillaghy.html

Abstract:

As astronomical image archives increase in size, information is getting more intricate to access. To compensate for this drawback, software methods to query the archive by its image content must be developed. Innovations in content-based information retrieval systems will eventually speed up and simplify the process of finding relevant information. To this end, the method presented here extracts features by considering data density in a three-dimensional attribute space. Relevant features are stored in symbolic representations called icons. Icons not only make possible queries by browsing or comparing, but also give the possibility to efficiently transfer information contained in images through the network.

1. Content Based Queries Require Feature Extraction

Because of the incessant size increase of image repositories, automatic ways to recognize and retrieve features contained in astronomical images have been investigated with growing interest (see, e.g., Burda & Feitzinger 1992; Murtagh et al. 1995). This article presents current work on a method to locate relevant information in images, extracting and storing it in compressed representations called icons. Queries by browsing or comparing may run efficiently on icons without having to access the original observational data.

Information (or feature) extraction must automatically find out the most relevant regions in an image. This operation, although straightforward for humans, is still a difficult task for a computer (Flickner et al. 1995). In many other disciplines than astronomy, ranging from text recognition to medical imaging, algorithms based on segmentation and thresholding (Russ 1994) have been applied. However, they generally cannot be used on astronomical images because of their high noise sensitivity. Hence, alternative astronomy-specific methods must be developed which should be able to (1) handle noise, (2) compress the original data, (3) be useful for comparisons of images and (4) run unsupervised.

2. Data Management Between Instrument and Observer

The scheme in Figure 1 depicts a management system supporting operations required for content-based information retrieval. It is being implemented for the management of an archive of solar radio spectrograms, recorded by two spectrometers of the radio astronomy group at ETH (Perrenoud 1982; Benz et al. 1991). An example of a spectrogram is shown on the left of Figure 2. It represents the flux of solar radio emission in the frequency/time plane and displays structures which may be related to high energy release in the solar corona.

 
Figure 1: The design of a system supporting content-based queries. Images are used only for event analysis (the corresponding path is shown by thick arrows) while browsing and comparing operations use image icons (thin arrows). Ovals symbolize software parts.
Figure 1: PS 1.3 Mb

The main software parts of this system are responsible for the data's dimensionality reduction: The feature extractor and the classifier.

2.1. Feature Extraction

In the following, algorithms for feature extraction and selection are summarized. They have been described more precisely in other articles (Csillaghy 1995; Csillaghy 1996). An image is assumed to be an array of n elements, with columns and rows.

  1. Pixels are parameterized, i.e., considered as independent points where i is the pixel number, , are its coordinates and its gray level. The set span a three-dimensional attribute space;

  2. the attribute space is partitioned into regions which can only grasp a fixed maximum number of items; the regions' shape is modeled in function of the points' three-dimensional distribution inside a region;

  3. high density regions are assumed to be more relevant than low density regions. They are therefore sorted by considering their volumes (small volumes are more relevant than large ones) and their basal surface (small surfaces more relevant);

  4. optionally, application-dependent specifications on the regions' shape can further restrain the set of regions containing relevant features.
Regions which passed the selection build the actual icon. The latter is visualized by drawing for each region a rectangle where (1) the mean value of the points in is the center of the region; (2) the mean z-value is the color and (3) the standard deviation in is the extension of the rectangle.

2.2. Searching Similar Images

Many automatic classification methods have been studied in the context of astronomy. Among them, artificial neural networks (Hertz, Krogh, & Palmer 1991) have raised interest because of their ability to cope with noisy phenomena. Here, current work with self-organizing maps, or SOMs (Murtagh & Hernández-Pajares 1995; Kohonen 1995), is reported although more detailed investigations will be the topic of another article. The basic idea is to use image icons, and more precisely their regions, as comparison objects instead of images.

The SOM is first trained to recognize the regions' shapes. A learning set containing all significant kinds of regions occurring in icons is built and used for this training. Specific locations on the SOM correspond then to distinct classes of region shapes. After the SOM has been trained, each region belonging to a given icon is ``presented'' to the SOM, which produces a reaction at a specific location. Reactions are subsequently summed, producing a characteristic map as shown on the right of Figure 2. Patterns in the map correspond to the most frequent reactions sites. They therefore indicate the kind of features occurring in the original image.

  
Figure 2: The original image (left), compared to its corresponding icon (center). On the right, a map of SOM's reacting nodes. Clearly defined patterns can be noticed.
Figure 2(left): PS 203 Kb, Figure 2(center): PS 22 Kb, Figure 2(right): PS 160 Kb

3. Steps Towards Efficient Information Access

A system designed to support content-based retrieval queries and two of its software parts, a feature extractor and a classifier, have been briefly presented. At the present time a first version of an icon browser has been implemented to query the solar radio spectrograms archive. In this system, icons are inserted into hypertext pages so that browsing can be done from any remote location. Today's queries are limited to specification of the observation epoch, the frequency range and some technical parameters. Queries based on the image content are being implemented, where more data-specific terms are taken into account.

The next step in the development of the system will be the association of a map (created following the method of section 2.2) with each icon so that similar images can be compared. To this end, self-organizing maps seems to be a robust method, but more experience is needed to evaluate their practical efficiency.

In the future, other methods to partition the attribute space should be investigated to provide a better evaluation of the overall utility of this approach in content-based image retrieval. Also, other ways to compare icons, such as minimum spanning trees or PCA, may be considered. Moreover, tests are being done on other kinds of data to generalize this method in order to generically facilitate the accessibility of images to users.

Acknowledgments:

I acknowledge valuable discussions with A. O. Benz and H. Hinterberger. I thank also the Conference Organizing Committee for its financial support. This project is partly supported by the Swiss National Science Foundation, Grant No. 20-040336.94.

References:

Burda, P., & Feitzinger, J. V. 1992, A&A, 161, 697

Benz, A. O., Güdel, M., Isliker, H., Miszkowicz, S., & Stehling, W. 1991, Solar Phys., 133, 385,
http://mimas.ethz.ch/papers/benz/phoenix/firstres.la/firstres.la.html

Csillaghy, A. 1995, Vistas In Astronomy, 39/1, 37,
http://mimas.ethz.ch/papers/csillaghy/strasb/paper/paper.html

Csillaghy, A. 1996, Vistas In Astronomy, submitted

Flickner, M., Sawhney, H., Niblack W, Ashley, J., Huang, Q., Dom, B., Gorka, M., Hafner, J., Lee, D., Patrovic, D., Steele, D., & Yanker, P. 1995, Computer, September, 23

Hertz, J., Krogh, A., & Palmer, R. G. 1991, Introduction to the theory of neural computation, (Addison-Wesley)

Kohonen, T. 1995, Self-organizing maps, (Springer)

Murtagh, F., & Hernández-Pajares 1995, Journal of Classification, 12, in press,
http://http.hq.eso.org/~fmurtagh/clustering.html

Murtagh, F., Zeilinger, W., Starck, J.-L., & Bijaoui, A. 1995, in Astronomical Data Analysis Software and Systems IV, ASP Conf. Ser., Vol. 77, eds. R. A. Shaw, H. E. Payne, & J. J. E. Hayes (San Francisco, ASP), p. 260

Perrenoud, M. 1982, Solar Phys., 81, 197,
http://mimas.ethz.ch/papers/others/ikarus/ikarus/ikarus.html

Russ, J. C. 1994, The image processing handbook, CRC Press, Chapter 6


Next: Storing and Distributing GONG Data
Previous: The CADC/ST-ECF Archives of HST Data: Less is More
Table of Contents --- Search --- PS reprint
Wed Jul 3 07:35:08 MST 1996