Now consider a nearby city where there are numerous architects. People can get architects to design their buildings or they can design their own. The structure of a building can be tailored to its use, so a house may be built using different materials and a different design to that of a school or a factory. This is a web based on SGML.
Recently I've been working on the problem of using generic SGML on the World Wide Web. Many of the features that people are crying out for in HTML and the Web have existed in SGML for years. One example is the ability to identify documents in a system- and location-independent way.
On the Web, there have been efforts to develop a Uniform Resource Naming scheme that gives documents a name apart from (but resolvable to) their location (expressed as a Uniform Resource Locator).
Such a naming scheme has always existed in SGML in the form of public
identifiers. Many Web authors will be familar with "-//IETF//DTD
HTML 2.0//EN" which is the public identifier for the HTML 2.0 DTD.
If generic SGML is to be widely used on the Web, there needs to be a
scalable solution to the problem of resolving public identifiers into
URLs. Such a solution would also make possible the use of public
identifiers in a Uniform Resource Naming scheme.
PUBLIC "-//W3C//DTD HTML 3.2//EN" "/usr/local/html/dtd/html-3.2.dtd"
maps the formal public identifier for the HTML 3.2 DTD to a local filename.
Entity managers like that included in James Clark's SP parser suite demonstrate an effective use of this catalog format and, in the case of SP, the ability to use URLs as SOIs makes it possible to map public identifiers to entities stored anywhere on the World Wide Web. This valuable extension is being formalized in the EXCH Internet-Draft of the IETF MIMESGML Working Group along with related extensions to facilitate the exchange of SGML documents via the Web.
The SGML Open catalog, though, provides only a flat mapping. The situation is somewhat akin to hostname resolution pre-DNS where each host had to contain a (large) table resolving names to IP addresses. With DNS, the name space became hierarchical and the role of name to IP address resolution was delegated to different levels of the hierarchy.
In this note I propose a simple extension to the SGML Open catalog format that allows for "delegating" catalogs. By allowing URLs as SOIs as per the EXCH proposal, this extension should provide a powerful solution to the problem of public identifier resolution over the Internet. The heart of this proposal lies in the hierarchical structure introduced to the owner ID of a public identifier by ISO/IEC 9070.
I propose to add the single keyword DELEGATE (or something similar), taking two arguments---an owner identifier prefix and an SOI refering to another catalog. By "owner identifier prefix" I mean all or some initial part of an owner identifier.
formal public identifier = owner identifier, "//", text identifier
where owner identifer is minimum data prefixed by +// for
registered owner identifiers and -// for unregistered owner
identifiers.
The character sequence // is what I shall call a separating
token.
An owner identifier prefix, then, is an initial part of an owner
identifier up to (and optionally) including a separating token. The
// token separating the owner identifier from the text
identifier is considered an allowable part of the owner identifier prefix.
As an example, the owner identifier prefixes for "-//IETF//DTD HTML
2.0//EN" are:
"-"
"-//"
"-//IETF"
"-//IETF//"
Each prefix x defines a set of possible FPIs that have x as an owner identifier prefix. I shall call this set the matching set of x.
Following on from ISO/IEC 9070, this proposal also allows for another
separating token, "::". So the owner identifier prefixes for
"-//IETF::HTML-WG//DTD HTML 2.0//EN" are:
"-"
"-//"
"-//IETF"
"-//IETF::"
"-//IETF::HTML-WG"
"-//IETF::HTML-WG//"
"-" and "-//", for
example. Others have subtle differences. "-//IETF//DTD RFC//EN"
is in the matching set of "-//IETF" but not that of
"-//IETF::".
ISO/IEC 9070 defines the canonical form of a public identifier by the production:
public identifer = owner name, "//", object name
In the case of ISO/IEC 9070 public identifiers, the owner identifier prefix is considered an initial part of the owner name up to (and optionally) including a separating token.
As an example of how an entity manager would use the delegating catalogs proposed here, consider a catalog on a local system that contains the line
DELEGATE "-//IETF" "http://www.ietf.org/catalog.txt"
An entity manager coming across "-//IETF::HTML-WG//DTD HTML
2.0//EN" could retrieve the catalog given in the URL SOI above.
This second catalog may have the entry
DELEGATE "-//IETF::HTML-WG" "http://www.ietf.org/html-wg/catalog.txt"
and the catalog mentioned in the line above may have the entry
PUBLIC "-//IETF::HTML-WG//DTD HTML 2.0//EN" "http://www.ietf.org/dtd/html-2.0.dtd"
This third catalog could be retrieved and the original public identifier resolved to a URL for, in this case, a DTD.
Within a single catalog file:
DELEGATE entry should lose against a PUBLIC
entry but win against other entry types.
DELEGATE entries could be used to
resolve a public identifier, a more specific (ie longer)
DELEGATE entry should win over a more general (ie shorter)
one.
DELEGATE entry occuring earlier in a file should win
against an equivalent DELEGATE entry occuring later
in the same file.
Where more than one catalog file makes up the catalog, an entry in an earlier file should always win against one in a later file, regardless of the rules above.
The details of the delegating catalog proposal are still open to revision. One change likely to occur is the recommendation that entity managers send the full public identifier as part of a network request for a catalog. This would add extra intelligence to the resolution by allowing catalogs to infact be CGI (or similar) programs that could take the public identifier and look it up in a database without having to return a huge catalog for the entity manager to wade through. Discussion on the proposal takes place on the mailing list fpi-urn@entmp.org. To subscribe, send a message to fpi-urn-request@entmp.org with the subject "subscribe".
This proposal owes an incredible amount to James Clark who not only implemented the proposal in his SP parser but pointed out the many flaws in its earlier versions. I have no doubt that the proposal would never have gotten off the ground had it not been for James.
ISO 8879:1986 Information processing -- Text and office systems -- Standard Generalized Markup Language (SGML)
ISO/IEC 9070:1991 Information technology -- SGML support facilities -- Registration procedures for public text owner identifiers
Grosso, P. SGML Open Technical Resolution 9401:1995 - Entity Management http://www.sgmlopen.org/sgml/docs/library/9401.htm
Stinchfield, D. Internet Draft: Using SGML Open Catalogs and MIME to Exchange SGML Documents http://ds.internic.net/internet-drafts/draft-ietf-mimesgml-exch-02.txt
Clark, J. SP - a new SGML parser http://www.jclark.com/sp.html
Tauber, J. A Proposal for Delegating SGML Open Catalogs http://www.entmp.org/fpi-urn/delegate.html
| Pointers to other Papers | ||
|---|---|---|
| Papers & posters in this theme | All Papers & posters | AusWeb96 Home Page |
AusWeb96 The Second Australian WorldWideWeb Conference "ausweb96@scu.edu.au"