Resolving Formal Public Identifiers on the WWW: A Proposal For Delegating SGML Open Catalogues


James K. Tauber, University CWIS Coordination Officer, The University of Western Australia, Nedlands, WA 6907, Australia. Email: jtauber@library.uwa.edu.au Home Page: James K. Tauber
Keywords: WorldWideWeb, SGML Open Catalog, Formal Public Identifiers, ISO/IEC 9070, Uniform Resource Naming

A Tale of Two Cities

Imagine, if you will, a city where all the buildings are built out of the same materials and designed by the same architect. Each building may differ from others in its contents, but regardless of whether the building is a house or a school or a factory, it has pretty much the same structure. This is a web based on HTML.

Now consider a nearby city where there are numerous architects. People can get architects to design their buildings or they can design their own. The structure of a building can be tailored to its use, so a house may be built using different materials and a different design to that of a school or a factory. This is a web based on SGML.

A First Step in SGML on the Web

Recently I've been working on the problem of using generic SGML on the World Wide Web. Many of the features that people are crying out for in HTML and the Web have existed in SGML for years. One example is the ability to identify documents in a system- and location-independent way.

On the Web, there have been efforts to develop a Uniform Resource Naming scheme that gives documents a name apart from (but resolvable to) their location (expressed as a Uniform Resource Locator).

Such a naming scheme has always existed in SGML in the form of public identifiers. Many Web authors will be familar with "-//IETF//DTD HTML 2.0//EN" which is the public identifier for the HTML 2.0 DTD. If generic SGML is to be widely used on the Web, there needs to be a scalable solution to the problem of resolving public identifiers into URLs. Such a solution would also make possible the use of public identifiers in a Uniform Resource Naming scheme.

Background

The SGML Open Technical Resolution 9401:1995 defines an entity catalog that can be used by SGML systems to map an external entity's public identifier to a system-dependent storage object identifer (SOI). For example, the catalog entry
PUBLIC	"-//W3C//DTD HTML 3.2//EN"	"/usr/local/html/dtd/html-3.2.dtd"

maps the formal public identifier for the HTML 3.2 DTD to a local filename.

Entity managers like that included in James Clark's SP parser suite demonstrate an effective use of this catalog format and, in the case of SP, the ability to use URLs as SOIs makes it possible to map public identifiers to entities stored anywhere on the World Wide Web. This valuable extension is being formalized in the EXCH Internet-Draft of the IETF MIMESGML Working Group along with related extensions to facilitate the exchange of SGML documents via the Web.

The SGML Open catalog, though, provides only a flat mapping. The situation is somewhat akin to hostname resolution pre-DNS where each host had to contain a (large) table resolving names to IP addresses. With DNS, the name space became hierarchical and the role of name to IP address resolution was delegated to different levels of the hierarchy.

A Proposal

In this note I propose a simple extension to the SGML Open catalog format that allows for "delegating" catalogs. By allowing URLs as SOIs as per the EXCH proposal, this extension should provide a powerful solution to the problem of public identifier resolution over the Internet. The heart of this proposal lies in the hierarchical structure introduced to the owner ID of a public identifier by ISO/IEC 9070.

I propose to add the single keyword DELEGATE (or something similar), taking two arguments---an owner identifier prefix and an SOI refering to another catalog. By "owner identifier prefix" I mean all or some initial part of an owner identifier.

Owner Identifier Prefix

ISO 8879

ISO 8879 defines a formal public identifier by the production:
formal public identifier =
   owner identifier, "//", text identifier

where owner identifer is minimum data prefixed by +// for registered owner identifiers and -// for unregistered owner identifiers.

The character sequence // is what I shall call a separating token. An owner identifier prefix, then, is an initial part of an owner identifier up to (and optionally) including a separating token. The // token separating the owner identifier from the text identifier is considered an allowable part of the owner identifier prefix.

As an example, the owner identifier prefixes for "-//IETF//DTD HTML 2.0//EN" are:

Each prefix x defines a set of possible FPIs that have x as an owner identifier prefix. I shall call this set the matching set of x.

Following on from ISO/IEC 9070, this proposal also allows for another separating token, "::". So the owner identifier prefixes for "-//IETF::HTML-WG//DTD HTML 2.0//EN" are:

Some of these may be deemed equivalent because their matching sets are congruent---"-" and "-//", for example. Others have subtle differences. "-//IETF//DTD RFC//EN" is in the matching set of "-//IETF" but not that of "-//IETF::".

ISO/IEC 9070

The definitions of separating token, owner identifier prefix and matching set hold true for public identifiers in the canonical form defined in ISO/IEC 9070. Hence this proposal also provides a solution to ISO/IEC 9070 public identifier resolution over the Internet, should this become important.

ISO/IEC 9070 defines the canonical form of a public identifier by the production:

public identifer =
   owner name, "//", object name

In the case of ISO/IEC 9070 public identifiers, the owner identifier prefix is considered an initial part of the owner name up to (and optionally) including a separating token.

A Resolution Example

As an example of how an entity manager would use the delegating catalogs proposed here, consider a catalog on a local system that contains the line

DELEGATE "-//IETF" "http://www.ietf.org/catalog.txt"

An entity manager coming across "-//IETF::HTML-WG//DTD HTML 2.0//EN" could retrieve the catalog given in the URL SOI above. This second catalog may have the entry

DELEGATE "-//IETF::HTML-WG" "http://www.ietf.org/html-wg/catalog.txt"

and the catalog mentioned in the line above may have the entry

PUBLIC "-//IETF::HTML-WG//DTD HTML 2.0//EN" "http://www.ietf.org/dtd/html-2.0.dtd" 

This third catalog could be retrieved and the original public identifier resolved to a URL for, in this case, a DTD.

Suggested Infrastructure

Although delegating catalogs could, at least in theory, work without it, effective public identifier resolution requires one or more central authorities to provide a root for the resolution hierarchy. This means that entity managers need only be told about the root server(s) initially. It also means that public text owners need only register in one place to make their text "visible" to everyone. Note that registration of this type still leaves the public text unregistered in ISO terms. This could be overcome by the root server(s) themselves registering with ISO.

Conflict between Entries in Catalogs

Within a single catalog file:

Where more than one catalog file makes up the catalog, an entry in an earlier file should always win against one in a later file, regardless of the rules above.

Concluding Remarks

Delegating catalogs are now supported by James Clark's SP parser and will soon find their way into other SGML software. While the formal public identifiers of ISO 8879 are designed primarily for SGML, the public identifiers of ISO/IEC 9070 are designed for identifying any object. They thus provide the perfect Uniform Resource Name (URN). The ISO/IEC 9070 identifiers could be used not only for DTDs and SGML documents, but Java classes and even e-mail addresses. The scheme proposed here could then be used to resolve them into URLs.

The details of the delegating catalog proposal are still open to revision. One change likely to occur is the recommendation that entity managers send the full public identifier as part of a network request for a catalog. This would add extra intelligence to the resolution by allowing catalogs to infact be CGI (or similar) programs that could take the public identifier and look it up in a database without having to return a huge catalog for the entity manager to wade through. Discussion on the proposal takes place on the mailing list fpi-urn@entmp.org. To subscribe, send a message to fpi-urn-request@entmp.org with the subject "subscribe".

Acknowledgements

I'd like to thank all of the subscribers to my FPI-URN mailing lists and in particular Martin Bryan and Murry Altheim. A special thanks to Paul Grosso both for suggesting I talk to James Clark and for inviting me to the SGML Open technical meeting in March 1996 to present my proposal.

This proposal owes an incredible amount to James Clark who not only implemented the proposal in his SP parser but pointed out the many flaws in its earlier versions. I have no doubt that the proposal would never have gotten off the ground had it not been for James.


References

ISO 8879:1986 Information processing -- Text and office systems -- Standard Generalized Markup Language (SGML)

ISO/IEC 9070:1991 Information technology -- SGML support facilities -- Registration procedures for public text owner identifiers

Grosso, P. SGML Open Technical Resolution 9401:1995 - Entity Management http://www.sgmlopen.org/sgml/docs/library/9401.htm

Stinchfield, D. Internet Draft: Using SGML Open Catalogs and MIME to Exchange SGML Documents http://ds.internic.net/internet-drafts/draft-ietf-mimesgml-exch-02.txt

Clark, J. SP - a new SGML parser http://www.jclark.com/sp.html

Tauber, J. A Proposal for Delegating SGML Open Catalogs http://www.entmp.org/fpi-urn/delegate.html


Copyright

James K. Tauber ©, 1996. The author assigns to Southern Cross University and other educational and non-profit institutions a non-exclusive licence to use this document for personal use and in courses of instruction provided that the article is used in full and this copyright statement is reproduced. The author also grants a non-exclusive licence to Southern Cross University to publish this document in full on the World Wide Web and on CD-ROM, and for the document to be published on mirrors on the World Wide Web. Any other usage is prohibited without the express permission of the author.
Pointers to other Papers
Papers & posters in this theme All Papers & posters AusWeb96 Home Page

AusWeb96 The Second Australian WorldWideWeb Conference "ausweb96@scu.edu.au"