Next: Software Support of the Astronomical Research Process---ADASS '95 Conference Summary
Previous: Data Distribution and Ph.D. Publication via the World Wide Web
Table of Contents --- Search --- PS reprint


Astronomical Data Analysis Software and Systems V
ASP Conference Series, Vol. 101, 1996
George H. Jacoby and Jeannette Barnes, eds.

The Electronic Astrophysical Journal: Resource Location and Archive Management

A. Warnock III

A/WWW Enterprises, 6652 Hawkeye Run, Columbia, MD 21044

J. M. Fullton

Clearinghouse for Networked Information Discovery and Retrieval/MCNC, 3021 Cornwallis Road, Research Triangle Park, NC 27709

Abstract:

Robust access to articles in the Electronic Astrophysical Journal requires a mechanism which allows easy retrieval of articles by end users coupled with the ability to manage the large collection of files by the server. This process is accomplished by assigning permanent names to the articles, rather than providing links directly to the articles using URLs. A CGI script resolves the article name into the current URL and redirects the user's browser to that location. Users may determine the permanent names by submitting queries to a citation server, which provides the link between an article's bibliographic information (title, author, etc.) and the permanent name.

1. Introduction

The on-line version of The Astrophysical Journal requires management of a large number of files corresponding to over 20,000 printed pages each year, an equivalent number of graphics files of line art and half-tones, and a large number of in-line graphics to enhance the native display fonts for on-line presentation on the World Wide Web (WWW). In general, individual Uniform Resource Locators (URLs) which directly reference files inside the server's local file system, are too fragile and may change too frequently to be reliably referenced by individual users.

Electronic access to journal articles requires that the links to the on-line files be stable and long-lived. Users should be able to save links, return to them at any time in the future and successfully retrieve the item of interest.

At the same time, archive management on the host machine requires flexibility in file locations. Hardware changes, system upgrades and archive organization all can require that files be moved. Because URLs on the World Wide Web incorporate path information from the host file system, moving files can render URLs invalid.

These issues have been recognized in the network community, and a number of proposals for implementing a permanent naming system on the Internet have been submitted as Internet Drafts. To date, none of these naming schemes have been implemented on a large scale in the popular Web browsers, so a different approach was required for the Electronic Astrophysical Journal.

No standard has yet been adopted by the Internet community, so the implementations described here are, at best, approximations to what will ultimately be approved by the IETF. The requirements of building a production system do not allow the luxury of waiting for a final standard to emerge. It is felt, however, that by using some basic guidelines derived from Internet Drafts and discussions, the current implementation will prove to be useful and close to the final one deployed widely in the future.

2. URNs---The Permanent Name System

In order to isolate the local file system from requests by outside users, the electronic version of each article is assigned a variant of a Uniform Resource Name (URN) (Hoffman & Daniel 1995a, 1995b; Shafer et al. 1995), a unique and permanent handle for use within the electronic archive. Electronic requests for articles are then processed through a resolution resource, implemented as a CGI script written in perl, which translates the URN into the current URL for the article and redirects the client appropriately.

The current implementation of the URN resolver assigns permanent names based on the Bibcode notation developed by CDS and NED, and used by the ADS for their abstract service (Accomazzi et al. 1996; Eichhorn et al. 1996). Bibcodes are adequate for published serials, and only a single item or format may be referenced, but this is suitable for many of the materials offered through the electronic journal. Extensions can be developed which handle multiple formats.

Operationally, the http server passes the URN to the CGI script which looks up the URN in a DBM database, finds the associated URL and uses the http directive ``Location'' to send the correct URL back to the client. Virtually all clients at this time respond correctly to the ``Location'' directive and retrieve the document named therein.

The additional overhead of a single lookup into the database appears to be small---there is an additional network transaction between client and server, and a small amount of text (the URL of the requested document) is passed to the client. The full text of the requested document is retrieved in any event.

An obvious enhancement would run the name resolver as a daemon, thereby avoiding the overhead of starting the script for each incoming request. Current Internet Drafts propose similar systems, running on dedicated ports, with clients written to know how to establish contact with these servers in order to resolve URNs in to URLs.

Changes in the locations of files are implemented by updating the URL in the DBM database while keeping the URN unchanged. In this way, users continue to request documents using the URNs, but are redirected to the correct URL. This mechanism has allowed the entire corpus of the Electronic Astrophysical Journal to be moved to different machines without requiring any changes to links in the articles. A global search and replace on the URLs in the URN database changed the locations for all of the files simultaneously.

3. URCs---The Card Catalog System

The development of a URN server provides a simple mechanism to assure permanent access to articles, independent of their location on the server. This allows the server administrator to provide search and browse access to the articles. However, users have no discovery mechanism by which they can determine the URNs which have been assigned to documents in the system. In the absence of such a discovery mechanism, articles can only be found by manually browsing through documents, hoping to uncover a relevant link.

In order to provide such a discovery mechanism, citation records are created for articles, which contain bibliographic information such as title and author, as well as the system's URN.

As articles are ingested into the on-line collection, bibliographic citation record is also created. The collection of citation records will be indexed using CNIDR's Isearch package, making the collection searchable using a full-text query. Field-based searching on title, author, abstract, keywords and bibliographic reference is also supported. The returned records contain relevant hyperlinks for the resulting articles, including the URN of the primary on-line resource.

4. Conclusions

A simple and robust mechanism has been constructed for assigning permanent names to documents in the on-line version of the Astrophysical Journal which allows the full flexibility required for managing the archive of journal articles at the AAS server, while simultaneously providing reliable and permanent access for end users. This URN resolver eliminates the need to modify URLs within the HTML versions of journal articles when files are moved. The URNs can be discovered by searching an on-line set of bibliographic citation records.

The URN and URC services are being incorporated into the production version of the Electronic Astrophysical Journal.

Acknowledgments:

The AAS Electronic Astrophysical Journal Letters Project is funded by the National Science Foundation and by the American Astronomical Society.

References:

Accomazzi, A., Eichhorn, G., Grant, C. S., Kurtz, M. J., & Murray S. S. 1996, this volume

Eichhorn, G., Accomazzi, A., Grant, C. S., Kurtz, M. J., & Murray, S. S. 1996, this volume

Hoffman, P. E., & Daniel, R. 1995a, Internet Draft ``x-dns-2 URN Scheme''. The name of the draft at the time of this writing is ``draft-ietf-uri-urn-x-dns-2-00.txt''.

Hoffman, P. E., & Daniel, R. 1995b, Internet Draft ``Generic URN Syntax''. The name of the draft at the time of this writing is ``draft-ietf-uri-urn-syntax-00.txt''.

Shafer, K. E., Miller, E. J., Tkac, V. M., & Weibel, S. L. 1995, Internet Draft ``URN Services''. The name of the draft at the time of this writing is ``draft-ietf-uri-urn-resolution-01.txt''.


Next: Software Support of the Astronomical Research Process---ADASS '95 Conference Summary
Previous: Data Distribution and Ph.D. Publication via the World Wide Web
Table of Contents --- Search --- PS reprint
Wed Jul 3 08:15:19 MST 1996