Next: An Enriched Meta-Information Schema for Astronomical Databases
Previous: The ADS Article Service Data Holdings and Access Methods
Table of Contents --- Search --- PS reprint


Astronomical Data Analysis Software and Systems V
ASP Conference Series, Vol. 101, 1996
George H. Jacoby and Jeannette Barnes, eds.

The Evolving Resource Metadata Infrastructure

Chris Biemesderfer

ferberts associates, Oracle, AZ

Abstract:

The search and discovery mechanisms that will facilitate and simplify systematic research on the Internet depend on systematic classifications of resources, as well as on standardized access to such metadata. The principles and technologies that will make this possible are evolving in the work of the Internet Engineering Task Force and the digital library initiatives, among others. The desired outcome is a set of standards, tools, and practices that permits both cataloging and retrieval to be comprehensive and efficient.

1. Introduction

Astronomers are used to the Internet as a part of their professional environment. Researchers have recently been treated to on-line access to The Astrophysical Journal (AAS 1995), and we can be sure that those engaged in research will depend increasingly on information of all kinds being available on the Internet. Like all other denizens of the network, astronomers want this material in conventional environments and formats, all of which will become more global over time.

Librarians have historically had the job of collecting and cataloging diverse information resources, and they are responsible for providing a normalized framework in which that information can be found. In the astronomical community, the developers and stewards of software and systems for analysis, archiving, and distribution of data recognized the need for consistent storage and transmission of digital data early (Wells, Greisen, and Harten 1981). Both groups have been engaged in largely the same task, that of organizing information, for largely the same reason, namely to offer a resource to consumers.

In spite of the apparent similarities in process and purpose, information scientists and technologists are posed equal challenges when faced with carrying out their responsibilities on the Internet. Librarians must gain familiarity with the intricacies and arcana of networking, while systems analysts have to understand how to really manage potentially staggering amounts of data.

2. Resource Metadata Terminology and Desiderata

In the context of this article, I am using the term ``resource'' to mean network resources, referring to any information or data that is accessible on the Internet, regardless of format or content. Metadata is descriptive information about a resource, functioning conceptually the same way as the header information of an image. Resource metadata may be:

The ultimate purpose of a metadata infrastructure is to make resources published on the Internet easy to find, and reliable to access. It should be possible for individuals as well as organizations to publish anything they want. All network resources, not just the astronomical literature or data archives, have to be described by the same rules. The implementation needs to support resource registration by producers as well as resource discovery by consumers, and should be distributed.

2.1 Metadata vs. Full Text Indexing

One response to the problem of resource discovery is the appearance of ``resource location'' services such as Yahoo and Lycos. These services ask for some supporting information when a resource is being registered, although the principal added value comes from the full text indexing that the repository performs on the remote resources. The electronic ApJ also includes a full-text search among its features (Dalterio et al. 1995).

Full text indexes presuppose the existence of text. Explicitly prepared metadata is crucial for resources that aren't textual (video, audio, etc.). Furthermore, discovery often depends on classifications that are implied by the text, but are not mentioned explicitly in it.

3. Metadata Work in Progress

I have already alluded to the prospect that organizing network resources involves a combination of library science skills and a practical understanding of network technology. Projects and discussions are being carried out by library organizations, and several groups within the IETF are pursuing issues of resource discovery and name resolution schemes.

3.1 Digital Library Initiatives

The information management problems we face on the Internet are well known among library scientists, as are the solutions. It is instructive to consider resource management on the Internet from the non-discipline-specific perspectives taken by library professionals (Fox et al. 1995).

Digital libraries in many forms are being created in many quarters: publishing, information science, library science, etc. ARPA, NSF, and NASA have combined forces to sponsor 6 notable digital library initiatives (DLIs) (ARPA 1994). These projects are developing digital collections and investigating means of sorting, searching, and distributing electronic material. The content of the collections vary, with some groups focussed on scholarly research material, others on multimedia and spatial datasets. The electronic ApJ project is participating jointly with the University of Illinois DLI to ensure that the distinct repositories are mutually searchable.

3.2 Uniform Resource Identifiers

The URI Working Group of the IETF, through a series of Internet drafts and RFCs (see Masinter 1995), has proposed an information architecture that uses URNs (Uniform Resource Names) to identify resources. URNs map to network locations via a name resolution service (a URN resolver). Information about resources (the metadata) is contained in data structures called URCs (Uniform Resource Characteristics).

Several independent implementations for URN spaces and resolution services are being discussed in the Internet community. The structure of URNs will contain a descriptive string chosen by the publisher of the resource, and a specification of the resolver that will give the location of the resource. URN resolvers are likely to be organized in a distributed, hierarchical system, modelled after the Domain Name Service (DNS).

3.3 Metadata Definition

Ultimately, resources need to be described by people who are familiar with their content, if we expect systematic searches made by automated discovery tools to work. Authors or publishers of resources should be encouraged to provide such descriptions in a standardized way with a minimal level of effort.

An effort to define a set of data elements that comprises the metadata description of resources has been undertaken by group of librarians, archivists, and representatives of crucial technical standards, sponsored by the Online Computer Library Center (OCLC) and the National Center for Supercomputing Applications (NCSA). A thirteen-element set (Weibel et al. 1995) was devised for ``document-like'' objects in March 1995. Metadata descriptions for resources of other types remain in development.

4. Conclusions and Suggestions

There is widespread agreement that all the information on the Internet is, paradoxically, both a tremendous asset and a dreadful burden. A system, or infrastructure, for gathering and managing metadata offers a solution to the problems associated with the profusion of digital information.

Metadata definitions and systems must be suitable for a wide variety of resources, and scaleable to enormous proportions. Mechanisms for registering resources and recording the associated metadata need to be automated to simplify administration and to make it easy for publishers to describe their material and to submit the descriptions.

Therefore, the mechanisms need to be straightforward, widely accepted, and broadly abstracted. Consistency of usage and practices is important. Achieving this kind of consistency always takes time, as consensus is built and conventions are determined.

In the meantime, what can we do about this?

References:

AAS 1995, The Electronic Astrophysical Journal, http://www.aas.org/ApJ/

ARPA 1994, Digital Library Initiative, http://www.grainger.uiuc.edu/dli/national.htm

Dalterio, H., et al. 1995, Vistas in Astronomy, 39, 7

Fox, E., et al., eds. 1995, Digital Libraries, Communications of the ACM, vol. 38, nr. 4

Masinter, L. 1995. Uniform Resource Identifiers (uri) Charter, http://www.ietf.cnri.reston.va.us/html.charters/uri-charter.html

Weibel, S., et al. 1995, OCLC/NCSA Metadata Workshop Report, http://www.oclc.org:5046/conferences/metadata/dublin_core_report.html

Wells, D., Greisen, E., & Harten, R. 1981, A&AS, 44, 363


Next: An Enriched Meta-Information Schema for Astronomical Databases
Previous: The ADS Article Service Data Holdings and Access Methods
Table of Contents --- Search --- PS reprint
Wed Jul 3 07:28:57 MST 1996