Next: The Gemini Project's Software Development Methodology
Previous: ASC Data Analysis Tool Architecture
Table of Contents --- Search --- PS reprint


Astronomical Data Analysis Software and Systems V
ASP Conference Series, Vol. 101, 1996
George H. Jacoby and Jeannette Barnes, eds.

Object-Oriented Modeling and Design for Sloan Digital Sky Survey Retained Data

Chih-Hao Huang, Jeff Munn, Brian Yanny, Stephen Kent, Don Petravick, Ruth Pordes

Fermi National Accelerator Laboratory, PO Box 500, Batavia, IL 60510

Alex Szalay, Robert Brunner

Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218

Abstract:

The SDSS project will produce tens of terabytes of non-trivially related data with uncertain complexity in usage. The survey is being conducted by an international collaboration consisting of eight institutions scattered throughout the US and Japan as well as numerous individuals at other sites. The data archive must provide adequate access to all collaborating partners during the five-year survey lifetime to support: development and testing of software algorithms; quality analysis on both the raw and processed data; selection of spectroscopic targets from the photometric catalogs; and scientific analysis. Additionally, the archive will serve as the foundation for the public distribution of the final calibrated data on a timely basis.

In this paper, we document how we applied Object-Oriented modeling and design to the development of data archives. In the end, based on the experience, we put Object-Orientation into perspective.

1. Introduction

The Sloan Digital Sky Survey (SDSS) project will produce tens of terabytes of data, consisting primarily of: a digital photometric map in the five bands of half the northern sky to about 23rd magnitude; a catalog of about 100,000,000 galaxies and a similar number of stars detected in a photometric map; photometric and astrometric calibrations for the object catalogs; and spectra and red-shift determinations for the brightest 1,000,000 galaxies and 100,000 Quasi-Stellar-Object candidates. These data will be stored in two major archives: the operational database, which supports the operational activities through out the survey, and the science archive, which is an end product made available to the science community.

2. Challenges

The quantity of data alone poses a great challenge to system design. The non-trivial complexity of the data extends the challenge to another dimension. However, the largest challenge lies in the complexity of the data usage. These data are to be operated upon and used by scientists with creative minds. It is almost impossible to predict exactly and completely how the data are to be used beforehand. This implies it is impractical to believe that one will be able to get a complete design in detail before the implementation. Therefore, we have to start with a design that is flexible and extensible so that it can accommodate the changes and new requirements that cannot be foreseen. This simply can not be accomplished by arranging the data into hundreds of thousands of flat files with multi-volume naming conventions.

3. Formal Design Methodology

A good methodology enables one to model a system without mentioning any specific implementation. In such practice, the designers are able to concentrate on what the system should be rather than being bothered by premature implementation details. Such models are at an abstract level and are good for communication to designers without deep computer science backgrounds. These very high level models may also serve as a road map for the end users, from which they can understand the concepts in the design.

There are many formal methodologies in software engineering. Granted, Object-Orientation is not the only solution to our problem, yet, it was a maturing methodology when we were looking for one and we took advantage of it.

4. Tool and Tool Selection

One of the essential keys to putting a methodology to work on designing a non-trivial system is to have a tool that supports the methodology. Without a tool, the rules and constraints of the methodology can not be enforced and the correctness of the design would be very difficult to verify.

Selecting a tool is an important issue. It hardly can be done by using a simple matrix on all features of all products. The reason is simple. If there is an essential functionality that is needed, a tool does not do it, and there is no way to patch it up, it simply can not be accomplished, no matter how good the tool is in other areas. Therefore, there is a set of essential criteria that should be used as discriminators in the tool selection. Only those that satisfy these criteria are to be considered.

Within the scope of SDSS, what we asked of the tool were (a) modeling capability, (b) code generation and (c) scripting capability. Basically, all design tools do (a) and most do (b). The perfect tool for our particular needs does not exist and probably never will. Therefore, (c) became the deciding factor. A tool driven by programmable scripts can be considered as a meta tool, which we may customize to fit our needs. As long as the scripting language is computationally complete, we can do virtually everything we want. Of course, it by no means implies such customization would be easy, but the possibility is there.

The tool that we use is Paradigm Plus. It does diagramming, keeps track of all defined objects in its query-able internal database, supports multiple methodologies, generates code for different target environments, and is driven by programmable scripts. This is not an endorsement for this particular product. We used it because it fitted into our criteria and it turned out to work very well for our purposes.

5. Group Dynamics

We have both astrophysicists and a computer scientist on board. It is an interesting question as to whether to put the astrophysicists or the computer scientist in charge of the system modeling. The question is essentially the following: Which one is easier, to have the computer scientist learn astrophysics or to have the astrophysicists learn computer science? Apparently, the astrophysicists are the domain experts who have more sense about what the system should be and they should be put in charge of the system modeling. A formal and sound methodology makes this possible.

One of the astrophysicists was in charge of maintaining the models. After discussions, official changes were made only through that person. That is one way to maintain the consistency of the high level models.

6. Configuration

The configuration of the design environment is illustrated in Figure 1.

  
Figure 1: Configuration of the design environment
Figure 1: PS 110 Kb

Central to this configuration is an internal database in Paradigm Plus which tracks every element defined in the models. Paradigm Plus comes with several maintenance screens that allow one to manipulate the elements in their finest granularity. However, the elements may also be defined and browsed through diagrams. Diagrams may be considered as ``sub-views'' of the official and complete models inside the internal database. The internal database is fully accessible from a scripting language. Code generation in this sense is merely the same as generating a report by running scripts against the database. Using proper scripts and templates, we are able to generate reports, C++ headers, database descriptions in Data Definition Language, FITS headers and HTML documentation.

In this configuration, the models, scripts and templates are controlled entities. No one should modify anything that is generated even if the result is not exactly what one expects, because, if one does so, the generated code becomes inconsistent with the models. Instead, one should always try to fix the model at the highest level that influences such results, which might involve modifying the templates, then regenerate the code again.

At the time of the writing of this paper, we have successfully used this environment to complete the data models of several pilot sub-systems in the SDSS operational database, including information for imaging run, astrometry, photo output and spectroscopic sub-systems. From these models, we are able to generate C++ headers, FITS headers, DDL files for the database and HTML documents.

7. Concluding Remarks

It is true that one can hardly identify any single virtue in Object-Orientation that had not already existed as a good concept and practice in software engineering in the past few decades. However, Object-Orientation puts all such advantages nicely into one package. Just as good tools do not necessarily make good workers, good methodology does not guarantee the success of a design. A good methodology gives good developers a greater chance to do a better job. Object-orientation, so far, has been experienced as a good methodology within the Sloan Digital Sky Survey Project.

References:

Booch, G. 1991, Object-Oriented Design with Applications, Benjamin/Cummings

Coad, P., & Yourdon, E. 1990, Object-Oriented Analysis, Yourdon Press Computing Series

Coad, P., & Yourdon, E. 1991, Object-Oriented Design, Yourdon Press Computing Series

Coad, P., & Yourdon, E. 1993, Object-Oriented Programming, Yourdon Press Computing Series

Korson, T., & McGregor, J. 1990, ``Understanding Object-Oriented: A Unifying Paradigm,'' Communications of the ACM, Vol. 33, No. 9, 40

Rumbaugh, J. 1991, Object-Oriented Modeling and Design, (Prentice Hall)

Wirfs-Brock, R., & Johnson, R. 1990, ``Surveying Current Research in Object-Oriented Design,'' Communications of the ACM, Vol. 33, No. 9, 105


Next: The Gemini Project's Software Development Methodology
Previous: ASC Data Analysis Tool Architecture
Table of Contents --- Search --- PS reprint
Wed Jul 3 07:47:11 MST 1996