The paper will not deal with the available tools for Web publishing, as this is a topic in its own right, dealt with elsewhere at this conference [HREF 2]. Nor will it consider electronic publishing in general, or scholarly electronic publishing over other media (such as CD-ROM).
Such publishing has traditionally taken place using the technology of print. This is still the primary technology for all disciplines, and is also the technology that provides the official archival record for almost all publications. However, print publication suffers from a number of disadvantages:
All of the above technologies needed to provide either equivalent functionality to print, or if this was not possible, enough additional functionality to compensate. In practice, for any scholarly publishing medium, three core sets of functions are needed:
At first glance, print publishing might seem to provide few restrictions - multiple fonts, sidebars and images are all possible. However, hyperlinks within the one publication are clumsy, and links (footnotes and citations) to other publications rely on the scholar having ready access to the publications linked to.
Listserv archives are usually restricted to documents in 7-bit ASCII. This is because of the need for such documents to pass through email gateways in transit and because no assumptions can be made about the display device at the other end.
Anonymous File Transfer Protocol (aftp) archives can be used to store any kind of file. In practice, most ejournals using this technology have tended to use 7-bit ASCII text documents. Some journals are experimenting with storing articles in richer formats like HyperText Markup Language (HTML [HREF 7]) or Adobe's Acrobat (PDF) format. An example of the latter is the Electronic Publishing Research Group's [HREF 8] Cajun Project [HREF 9].
Gopher servers can also provide a range of document types, but most ejournals mounted on gopher servers also store documents in 7-bit ASCII text. A wider range of MultiPurpose Internet Mail Extension (MIME [HREF 10]) types is now supported by available gopher clients and servers - the lack of adoption of this facility to distribute documents in other formats is probably being affected by the general rise in popularity of the Web.
World Wide Web documents are written in HTML [HREF 7]. This provides for formatted text, inline graphics, hyperlinks within documents, links to other HTML [HREF 7] documents, and links to documents in other formats altogether. However, the scholar writing for the Web needs to be aware that a wide range of browsers will be used to access their work. Not all browsers format HTML [HREF 7] in the same way, and the available range of markup tags is restricted (see Deficiencies of HTML in this paper). Thus a lesser degree of control over the final appearance of the document is inevitable.
In the print world, this is often limited to the physical arrival of a new issue of a journal (often on a semi-regular, predictable schedule). If the journal comes to a library, the scholar has to check the shelves periodically, or rely on some sort of alerting service. Such a service might be provided by the library (in the form of photocopied contents pages) or a commercial information provider like DIALOG (via the results of an SDI search on a contents database). Alternatively, scholars can directly search online databases of abstracts and citations looking for relevant information, but this requires them to take the initiative and can easily get crowded out of a busy schedule.
In the domain of epublishing, the standard solution to the notification problem is to use one of a number of of computer-mediated communication technologies. By far the most popular is electronic mail, with network news a distant second. Two distinct strategies can be employed. The first is to email the entire text of the latest issue of an ejournal direct to a scholar's mailbox. In this case, the notification is directly analogous to the arrival of a print journal. An alternative increasingly being adopted is to notify the scholar of the publication of a new journal, include authors, title and abstract information, and provide advice on how to access either the entire journal or particular articles of interest. For aftp, Gopher and Web journals, this access information is usually in the form of a Uniform Resource Locator (URL [HREF 11]).
In the print world, if the journal is delivered directly to the user, the problem of journal location is limited to finding the journal within the context of the scholar's own personal information management system. If the journal is delivered to the library, it will be filed in some well-defined sequence. To assist with locating articles within journals, the publishing industry has developed a range of standard tools: contents pages at the front of issues, yearly cumulative printed indexes, and the like.
Listserv archives enable scholars to access information via email. All that
is required is to email a GET command to the listserver address requesting
that a specified file be sent by return email. As email is the lowest common
denominator for users of the Internet, this provides the widest possible
audience. As an example, consider the reference in this paper to Harnad
(1991). This
article in the refereed ejournal Public-Access Computer Systems Review
(PACS-R) can be retrieved by sending the e-mail message get
harnad prv2n1 f=mail to listserv@uhupvm1.uh.edu. Of
course, before issuing a GET command, one needs to know that the file
exists. Some journals, including PACS-R, handle this by sending the
table of contents and abstracts to users subscribed to the PACS-L or PACS-P
mailing
lists. Alternatively, it is possible to email commands to some listservers
instructing them to search a database and return a list of articles that
match the search criteria. These articles can then be retrieved as above.
Scholars can access articles in anonymous ftp archives either by using a
dedicated ftp client, or by providing an ftp URL [HREF 11]
to a Web browser like Lynx,
Mosaic or Netscape. If the URL [HREF 11]
formalism is not being used, then the ftp location of the article
will need to specify host machine, directory path and filename. For example,
the information encoded in the URL [HREF 11]
ftp://cogsci.ecs.soton.ac.uk//pub/harnad/Harnad/harnad95.quo.vadis can also be expanded into (more or less) plain English as "Make an
anonymous ftp connection to
". The URL [HREF 11]
formalism has the advantage of being more compact as well as parseable by
both humans and
machines. One example of a journal accessed by aftp is Psycholoquy [HREF
12], edited by Stevan
Harnad [HREF 13].cogsci.ecs.soton.ac.uk, move into the directory
/pub/harnad/Harnad/ and retrieve the file called
harnad95.quo.vadis
Gopher [HREF 14] (Wiggins 1993) was initially developed to provide a basis for mounting Campus Wide Information Systems (CWISs). It is based around the idea of hierarchical menus, and allows the server administrators a lot of flexibility in how they structure their information space. One fairly standard way to mount ejournals on a gopher server is to have a menu of possible journals. Each journal points to a menu of issues for that journal. Each issue points to the individual articles. Given unambiguous information about the path to be followed, scholars can navigate through the menus until they locate the files they want. It is also possible to provide Gopher URLs [HREF 11] for direct access using a Web browser. An example of a journal available through Gopher is the Mathematical Physics Electronic Journal [HREF 15].
The Web, with its non-hierarchical document-based networked hypermedia architecture provides a much richer environment for electronic publishing. Documents can either be reached by following an existing link, or can be accessed directly by entering a valid URL [HREF 11]. Documents can in turn refer to other documents and provide direct links to them (something that is not possible with documents accessed using a Gopher client). Examples of a range of scholarly journals on the Web [HREF 16] will be discussed below.
The current solution is to ensure that when directory hierarchies are re-organised on servers, links are placed from old locations to new locations. Under Unix this can be done with link files. On the Macintosh, aliases perform a similar function. On Web servers, a small document that
A far preferable solution is to adapt the method used for scholarly links to other documents for centuries - the scholarly citation. As an illustration, consider this paper. At the end of it, just before the Hyper References section is another entitled References. This provides links to other documents in the form of standardised citations. These citations do not make reference to the location of the document; they only specify its name in some unambiguous form. The Web equivalent, of course, is the distinction between Universal Resource Locators (URLs [HREF 11]) and Uniform Resource Names (URNs [HREF 18]). As in the print world, what scholars want to be able to link to is the contents of other documents - the locations of those documents should be irrelevant. URLs [HREF 11], with their dependence on a particular machine and directory path, are a transitional kludge. URNs [HREF 18], with their intended ability to refer to a known resource and have the system take care of locating it and accessing it, are the long term solution.
Tony Barry [HREF 19] from the ANU has suggested that we need to start viewing documents as continuously updated (Barry 1995), and that scholars should get recognition for the currency of their documents rather than the number (Barry 1994). I am profoundly sceptical that universities who are currently just starting to grapple with recognising the validity of electronic publications are ready for this visionary proposal.
The High Energy Physics community has already moved to a model of electronic publishing which allows for ongoing corrections and addenda. The hep-th e-print archive [HREF 20] which provides this facility "serves over 20,000 users from more than 60 countries, and processes over 30,000 messages per day" (Ginsparg 1994 [HREF 21]).
If documents are continuously changing and evolving over time, which version should be cited? Which version is the 'publication of record', and does this mean anything any more? Two solutions are used to the problem of permanence on the Web at present.
In one, every time the document changes, its name changes also. If the older version is replaced by the newer, then all URLs [HREF 11] pointing to the older version break. Moving to URNs [HREF 18] will not help in this case.
The alternative solution is to keep the name the same and update the content. Existing URLs [HREF 11] will still work, although the target of the URL [HREF 11] may have changed its content significantly. In this case, what if one scholar cites a section in a document that disappears in the next revision?
Perhaps the only solution is to distinguish somehow between fixed documents (print-like) and continously updated documents (database-like), or at least to make it clear at the top of a document into which category it falls. This approach has been used by Bailey (1995). The HTML version [HREF 22] of the document is continuously updated - the ASCII version [HREF 23] is fixed and permanently archived.
In many ways, the digital nature of all electronic publishing can be both a strength and a weakness in the area of durability. A strength, because digital documents can easily be copied and replicated at multiple sites around the world. A weakness, because destroying a digital document is far easier than destroying a physical document. It is easy to assume that the document will exist elsewhere on the Net and that the fate of a single copy is irrelevant. Of course, there is no mechanism to prevent everyone making this assumption and causing the loss for ever of a piece of scholarship. In some ways, the analogy of the single manuscript forgotten on top of a cupboard in a monastery somewhere in the Dark Ages may well be a forgotten directory on a rarely used hard-disk somewhere in a university. Unfortunately, it is all to easy to delete a directory; throwing away a manuscript without realising is somewhat harder. Given the lack of any mechanism to ensure the archiving of print publications, it seems unlikely (although relatively technologically simple) that anything will be done about the situation fo digital documents.
PostModern Culture [HREF 26] routinely contains hypermedia articles alongside more traditional text-only material. As an example, McNeilly (1995) [HREF 27] contains links to a number of sound files which are used to illustrate particular points in the article.
I am not aware of any ejournals that use the gateway facility to provide access to data sourced from other systems. As an example of what is possible, consider ERIN, the Australian Environmental Resources Information Network [HREF 28]. While not a scholarly journal itself, this system does provide access to a wide range of scholarly information. Use of a Web gateway allows the user to generate distribution maps for nominated species and run simulation models [HREF 29] in real time. Imagine the possibilities if a journal article allowed the reader to run a simulation directly while varying the input data and monitoring the results.
JAIR, the Journal of Artificial Intelligence Research [HREF 30], is using the Web to deliver articles in Postcript or HTML [HREF 7] format. As an example, Schlimmer & Hermens (1993) is available in both a PostScript version [HREF 31] and an HTML version [HREF 32]. JAIR is also experimenting with delivering other forms of supporting information. The Schlimmer and Hermens (1993) article comes with an appendi x [HREF 33] containing a 1.3MB Quicktime video which illustrates some of their research findings. At the moment at least three things are limiting the wider use of anything other than text in scholarly publishing:
While there will no doubt be an application for VT100 Web browsers like Lynx for a few years, the computing world is rapidly going graphical. Already the majority of Web browsers run under a GUI, and this trend will continue. Having to code for non-graphical browsers is probably another short-term difficulty.
Scholarly conservatism may prove a more long-term constraint, only susceptible to generational change. Many scholars will no doubt only use the Web (if at all) to publish what they publish already but faster and in electronic form. The habits of centuries of print publishing (in the case of scholars in general) and of decades of practice (in the case of individual scholars) will take a while to change.
The Web's ability to link to other information makes it possible to envisage a range of extensions to traditional scholarly publishing. These include:
Phillip Greenspun, from MIT, has also written on the deficiencies of HTML [HREF 39]. His preferred solution is to make much wider use of the META tag included in HTML [HREF 7] level 2.
HTML [HREF 7] is certainly evolving towards full SGML [HREF 37] compliance, but betrays its origin as a formatting language rather than a structuring language at every turn. It may not be possible to migrate entirely seamlessly towards SGML [HREF 337]. Indeed it may not be necessary. Many types of publishing do not require the range of features listed by Price-Wilkins. SGML [HREF 37] to HTML [HREF 7] gateways may only be required for particular kinds of large complex documents.
Price-Wilkins (1994a) argues that "because the Web does not include structure awareness in its protocol and because HTML [HREF 7] markup provides so little support for structural representation of features, the author and the administrator are forced to fragment documents into a sets of reasonably sized components.". This is no doubt true for large documents with complex internal structures, but is less of an issue for the shorter documents typical of scholarly publishing.
Tim Berner-Lee's preferred style [HREF 41] is for shortish (up to 5 pages) nodes linked together in some logical sequence, preferably based on a tree structure. On its own, this implies that the reader will have to navigate back up branches in order to access the next section. Documents designed using this model should provide the reader with a link labelled "next" at the end of each node to let them move through the document in a linear manner if desired. This style works well for things like online reference material but seems less appropriate for scholarly publishing. A scholarly article is more of a single entity and should be represented as such. If the article is a long one, it may be appropriate to split it into sections or place a table of contents with links to internal anchors at the beginning. The advantage of keeping the article as an entity is that the user can easily print it out (if required), without having to retrieve multiple segments and ensure that they are collated in the correct order. Until a majority of the intended audience is comfortable with reading entirely from the screen, and has the hardware to make this possible, the likelihood that material will be printed out has to be kept in mind when writing scholarly Web documents.
Most ejournals do not require the scholar to assign copyright to the journal, but only to certify that the material is being published for the first time. Reproduction or reuse is usually permitted provided that it is made clear that the material first appeared in the ejournal. In the case of print journals the situation is less clear. In the field of scholarly publication (what Harnad (1995b) [HREF 3] calls "esoteric" publication) the intention of the author is not to derive revenue through publication. The rewards are usually much more intangible. Therefore, restrictions on electronic publication are only a concern for the publisher. If the scholar wishes to make an reprint available electronically, some publishers will not permit this. Others will turn a blind eye, provided they are convinced it will not affect print sales.
One possible solution for an author is only to assign rights to the publisher over printed versions of a document. As an example of this, W. H. Calvin [HREF 42] from the University of Washington has taken this approach with The Ascent of Mind [HREF 43] published in 1990 through Bantam (personal communication, 14/3/95). His retention of the electronic rights has now allowed him to mount the entire text, complete with illustrations on the Web. Presumably, publishers will only be prepared to allow this if the book is out of print (and no reprints are planned), or if they do not believe that electronic access will affect print sales.
An additional aspect of publication quality that may well become relevant once Web publishing becomes widespread is the number and quality of the hyperlinks in the document. This would be in addition to the necessary references to other cited scholarly works. A richly linked document (both internally and externally, with the links updated as necessary) would be much more useful than one with only the bare minimum of links.
It is possible to imagine some sort of automatic Web-traversing robot which built up a picture of which links pointed to which documents for the purposes of citation analysis. Whether anyone will undertake this, and what the bandwidth implications would be are another matter altogether.
In the longer term, the Web is probably not the future of scholarly publishing. It is both a part of the present, and a pointer to the future. Other technologies will no doubt surpass the Web in time. Hyper-G looms as a possibility, and Project Xanadu may move from virtuality to reality before the end of the millennium. The significance of the Web is the way in which it enables a far more significant break from print than has been achieved to date. It does this because it does all that print does and then more. For scholars, exploring the implications of that more for their publishing and communication is sufficient challenge for the near term.
For this document I decided to experiment with writing directly in HTML. I began by using HTML_Pro 1.07 [HREF 49]. This provides a nice way to switch between mostly WYSIWYG text and the raw HTML. Unfortunately it is limited to 32K documents. Once the file got too large, I switched to BBEdit Lite and Netscape Navigator 1.1 [HREF 50] as a document previewer. I found that going back to editing raw text slowed me down too much, so I finished the editing in HTML Editor [HREF 51], with occasional use of Netscape to test the URLs.
Quite possibly no one other than me [HREF 1] will be interested in this, but I don't particularly care!
T. Barry (1995), "Network Publishing on the Internet in Australia", in The Virtual Information Experience - Proceedings of Information Online and OnDisc '95, Information Science Section, Australian Library and Information Association, pp. 239-249.
P. Ginsparg (1994), "First Steps towards Electronic Research
Communication", Computers in Physics, August.
S. Harnad (1990), "Scholarly Skywriting and the Prepublication Continuum
of Scientific Inquiry", in Psychological Science, Vol. 1, pp.
342 - 343 (reprinted in Current Contents 45: 9-13, November 11 1991).
S. Harnad (1991), "Post-Gutenberg Galaxy: The Fourth Revolution in the
Means of Production of Knowledge", in The Public-Access Computer
Systems Review, Vol. 2, No.1, pp. 39-53. To retrieve this file, send the
e-mail message get harnad prv2n1 f=mail to
listserv@uhupvm1.uh.edu.
S. Harnad, (1995a), "Implementing Peer Review on the Net: Scientific Quality Control in Scholarly Electronic Journals", in Peek, R. & Newby, G. (Eds.), Electronic Publishing Confronts Academia: The Agenda for the Year 2000. Cambridge MA: MIT Press.
S. Harnad, (1995b) "Electronic Scholarly Publication: Quo Vadis?", in Serials Review Vol. 21, No. 1, pp. 70-72.
D. S. Kaufer & K. M. Carley (1993), Communication at a Distance - The Influence of Print on Sociocultural Organization and Change, Lawrence Erlbaum Associates.
K. McNeilly (1995), "Ugly Beauty: John Zorn and the Politics of Postmodern Music", in Postmodern Culture, Vol.5, No.2 (January).
A. Odlyzko (1995), "Tragic loss or good riddance? The impending demise of traditional scholarly journals" in Electronic Publishing Confronts Academia: The Agenda for the Year 2000, Robin P. Peek and Gregory B. Newby, eds., MIT Press/ASIS monograph, MIT Press.
J. Price-Wilkin (1994a), "Using the World-Wide Web to Deliver Complex
Electronic Documents: Implications for Libraries" in The
Public-Access Computer Systems Review, Vol. 5, No. 3, pp. 5-21. To
retrieve this file, send the following e-mail message to
listserv@uhupvm1.uh.edu: GET PRICEWIL PRV5N3
F=MAIL.
J. Price-Wilkin (1994b), "A Gateway Between the World-Wide Web and PAT:
Exploiting SGML Through the Web.", in The Public-Access Computer
Systems
Review, Vol. 5, No. 7 , pp. 5-27. To retrieve this file send the
following
e-mail message to listserv@uhupvm1.uh.edu: GET PRICEWIL
PRV5N7 F=MAIL.
D. Schauder (1994), Electronic Publishing of Professional Articles: Attitudes of Academics and Implications for the Scholarly Communication Industry, Unpublished Ph. D. Dissertation, University of Melbourne.
J. C. Schlimmer & L. A. Hermens (1993),"Software Agents: Completing Patterns and Constructing User Interfaces", Journal of Artificial Intelligence Research, Vol. 1, pp. 61-89.
R. Wiggins (1993), "The
University of Minnesota's Internet Gopher System: A Tool for Accessing
Network-Based Electronic Information", in The Public-Access
Computer Systems Review , Vol. 4, No. 2, pp. 4-60.
To retrieve this file, send the e-mail message get wiggins2 prv4n2
f=mail to listserv@uhupvm1.uh.edu.
AusWeb95 The First Australian World Wide Web Conference