NOAO Data Products Pipelines
Frank Valdes and Chris Smith
The Vision
We envision all NOAO operated instruments having pipelines to
automatically produce and archive scientifically useful data products
from the raw observations. The pipelines run at the telescope in near
real-time, process taped raw data, perform on-the-fly recalibration
in archives, and are exported to the astronomical community. At the
telescope the data products are produced for the observer and automatically
entered into an NOAO Data Products Archive with suitable proprietary
protections. The data products include data quality information.
At the telescope this is displayed in real-time for instrument and
exposure evaluation. In an archive it allows evaluation of the data.
by potential users.
The Mosaic Data Products Pipeline
The Mosaic Data Products Pipeline is the first step towards achieving
this vision. It targets a single major NOAO instrument, the Mosaic
Imagers. The objectives of this pipeline are to:
- produce NOAO Mosaic data products
- archive NOAO Mosaic data products
- develop a general pipeline infrastructure
- develop data parallel methods for mosaics
The data products to be produced and archived include:
- instrumentally calibrated mosaic camera exposures
- deep, gap-free images from dither sequences
- data quality measurements
- catalogs of objects
- catalogs of variable objects
- alerts of objects with unusual colors
- detections of moving objects
- detections of supernovae
- alerts of moving objects, supernovae, variables, unusual colors
The moving, variable, and unusual color products depend on observing
programs that take multiple exposures of the same part of the sky or
having an archive of reference images.
The Development Plan
The major elements that must be developed for a Mosaic Data Products
Pipeline are:
- pipeline infrastructue
- archive interactions
- data parallel infrastructure and methods
- mosaic calibration methods
- data quality methods
- catalog generation
- transient detections
- alerts
- reference image database for difference imaging
This list is long and each element is fairly complex. Assuming a
modest manpower ( ~2 FTE divided over 3-4 people ) it is not
possible to attempt to attack all the elements simultaneously and achieve
measurable progress. Instead, we propose to divide the list in two
as described below. Timescales are conservative ballpark guesses.
Note that some initial work on elements the second stage may overlap
completion of the first stage. Also each stage can be broken down into
smaller steps with specific measurable objectives.
Stage 1: Mosaic Calibration Pipeline
The first stage targets two primary data products, calibrated
mosaic images and data quality information. The methods needed to
produce these data products require some additional development but
most of the basic concepts are well understood and implementations in
IRAF are available. Limiting the data products allows concentration on
the elements of:
- pipeline infrastructue
- basic image archive interactions
- data parallel infrastructure and methods
- mosaic calibration methods
- data quality methods
While there is considerable effort on infrastructure in this stage,
a mosaic calibration pipeline producing the two basic data products
is a valuable result with significant impact.
The target for a useful first stage near real-time telescope pipeline,
including testing, is August 2003 . Components of the pipeline,
especially the data quality measurements, can be made available earlier
as tasks in the MSCRED package. A prototype pipeline, one which is
restricted to specific programs such as the SuperMacho/SuperNovae (SMSN)
surveys, may be developed sooner.
Stage 2: Mosaic Catalog and Transient Event Pipeline
The second stage builds on the pipeline and archive infrastructure
and calibrated Mosaic data products from the first stage. In this stage
we can concentrate on the methods for producing the less well understood
data products and algorithms. These include:
- advanced archive interactions with non-image data products
- catalog generation
- transient detections
- alerts
- reference image database for difference imaging
The target for a general catalog and transient event pipeline is
June 2004 . Standalone modules, such as for difference imaging,
and a limited set of data products may appear sooner.
The Impact
WHO AND HOW DO WE IMPACT OUTSIDE SDS?
- By providing automatically reduced Mosaic data to:
WHO:
- survey program teams
- general users
- archive users
HOW:
- less burden on PIs/survey teams to produce uniform datasets
- decreased time to archive availability
- simply archivability in case of general users
- more uniform data for archive and documented processing
- By providing near-real-time data quality assessment
WHO:
- observers
- those responsible for reduction
- archive users
HOW:
- providing some DQA in real time can enhance the data by
providing feedback to observers to improve their observing, e.g.,
telescope problems (pointing, guiding), optical problems (focus,
mirror support), and problems with instrument (bias jumps, noise), etc.
- DQA definitions and algorithms provide those involved in
reduction well defined contexts to decide on what data is "good"
as well as how to present it to the community.
- DQA is absolutely necessary in the archive so that those
retrieving the data have some idea about the quality.
DQA includes not only derived quantities but, if possible,
observer comments (automated logging) and ancillary information
(weather, temperature, etc.)
- By providing transient detection systems
WHO and HOW:
- enable general users to do transient science (NEW PROJECTS!)
- optimization of processes (alignment, PSF match, subtraction,
detections)
- makes current Survey programs more efficient
- a TODO item called out by LSST Data Mgmt workgroup
- generalized alert system
- enable science based on multiple cuts on transients, including
color outliers
HOW DO WE IMPACT SDS AND WHAT DO WE LEARN?
- Address the SDS Strategic Priorities
- Basic pipeline development - gotta walk before you run
- exploration of available new technologies (OPUS, Condor, etc.)
- automation
- robustness
- operations
- Real-time analysis experience
- software optimization (including data parallel for mosaics/LSST)
- hardware experience (infrastructure issues)
- operational issues (in addition to above)
- infrastructure for alert system
- mail systems
- automated finding charts
- methods of feedback from community (e.g., "I took a spectrum...")
- DQA
- identification of metrics involved
- automated derivation
- how to provide meaningful feedback to observer and archive users
- Potential Mosaic STB processing
- A faster growing and more credible archive and NVO node.