Embeddable Components For Stand-Alone Web Applications


Steve Ball, Department of Computer Science, Australian National University, ACTON, ACT 0200, Australia. Phone +61 6 249 5146 Fax: +61 6 249 0010 Steve.Ball@tcltk.anu.edu.au Home Page [HREF 1]


Keywords

World Wide Web, Embed, Embeddable, Embedded, Components, Tcl, Tk, Tcl/Tk, SurfIt!, Plume, HTTP, HTML, CSS, XML


Abstract

Many application developers wish to add the ability to access World Wide Web resources to their program. This paper describes two general-purpose libraries to provide easy access to Web documents: a library for retrieving and handling Web documents and a library for displaying HTML and XML documents.


Introduction

The typical architecture of applications deployed using the World Wide Web makes use of a general-purpose browser to implement the user interface, see figure 1. Here, all interactions between the user and the application take place within the framework of the browser. Client-side programming technologies, such as Java, do not significantly alter this architecture.

Typical Architecture Of Web Application
Figure 1: Typical Architecture Of Web Application

SurfIt! [HREF 2] is a World Wide Web browser which has been available for use by Internet users and developers since 1995. This browser may be used as a stand-alone, general-purpose Web browser, in the same manner as Netscape Navigator or Microsoft Internet Explorer. While the development of SurfIt! as a browser has continued since its first release, many application developers who have used the browser have expressed much interest in using components of it to add Web access to their own applications. These are developers who do not wish to implement the user interface of their Web applications using a standard Web browser. Instead their applications will present a customised user interface within which Web documents will be presented to the user in some fashion. Examples of applications which may wish to present a customised user interface include Internet-based electronic games, specialised Intranet applications and applications using online HTML documents for their help subsystem. Developers of these applications require a library which will assist them in gaining access to Web resources, and to display Web documents as necessary. A typical architecture for these types of applications is shown in Figure 2.


Figure 2: Typical Architecture Of A Stand-Alone Web Application

There is no reason why a developer could not write their own code for accessing remote servers using HTTP and for displaying HTML documents. A minimal HTTP client package has been written using only approximately 100 lines of Tcl script code. Similarly, Uhler's HTML library [HREF 7] contains a 10 line HTML parser. However, the burden is upon the application developer to ensure correct operation of the protocol handler and adherance to the protocol specification. Also, layered functionality, such as document caching and so on, quickly adds to "code bloat". Uhler's complete HTML display library, incorporating document display code, is actually over 1000 lines of Tcl script code.

With this requirement in mind, the implementation of the next release of the SurfIt! browser, which has been renamed "Plume", has been undertaken with a view to releasing component libraries separately. The development of a subroutine library to provide functions for performing elementary Web-related tasks is not new: CERN's libwww C library [HREF 3] was first developed circa 1990. However, it is not easy to incorporate the use of this library into an application, which is the major design goal of the component libraries provided by Plume.

Why use Tcl?

Plume has been implemented using the Tool Command Language, Tcl [HREF 4]. The component libraries are to be released as Tcl packages. Tcl has been chosen to implement the component libraries for the following reasons: The libraries described below are naturally suited to being used within Tcl applications. However, due to the embeddable properties of Tcl they are by no means limited to such use. Tcl is available as a shared library on Unix and Apple Macintosh platforms and as a DLL on Microsft Windows 3.1/95/NT platforms. There is a simple C API which allows a C or C++ programmer to invoke Tcl commands, and to have the Tcl script return data to the program.

The Document Handling Package

The basic data object of the World Wide Web is a document. A document may be textual, image data, video or audio clips, and so on. The URL of a document uniquely identifies the atomic object. It is upon these objects that an application must perform some form of processing.

In order for a program to be able to manipulate World Wide Web documents it is first necessary to be able to retrieve a document's data into the program's address space. This is the basic purpose of the Document Handling Package (DHP). However, the task of manipulating a document is not as simple as presenting the data to the application.

Why is a DHP Necessary?

Document data may be retrieved from a variety of sources: the local filesystem, HTTP and FTP servers, and so on. Once a document has been fetched, it may be processed in a variety of different ways by the application, for example it may be saved to disk or passed to a media-specific processor. There are many complications to the process, such as requiring HTTP requests to be first sent to a proxy server, or the application may wish to cache document data in the local filesystem.

The Document Handling Package provides an extensible framework for applications to perform all of the necessary processing on a document. It hides the complexity of basic document handling, such as accessing proxy servers and document caching, and allows the application to process documents at a high-level according to the document's media type.

What Should a DHP do?

A convenient DHP will allow documents to be retrieved by simply supplying the desired document's URL. An URL may be given as an absolute URI or a relative URI. In the latter case both the base URI and relative URI must be given. Once the document has been retrieved, media-specific processing must be performed in a transparent manner. The application must have an easy method of specifying which media types are accepted, and how to process them. Of course, most applications require more than simple interactions with documents and document servers so the interface to the DHP must cater for advanced use. The general philosophy in designing the API is to make simple tasks as simple to perform as possible, but to also allow more complex functions to be performed.

Accordingly, the DHP needs to provide the following functions:

Plume's Document Handling Package

A Document Handling Package has been developed as part of the implementation of the Plume Web browser, providing all of the features necessary for a DHP as described above. It interfaces with the application via a single command: document. Loading a document is as simple as issuing the Tcl command:

  document loaduri URL
The document command has been created to deal with WWW documents, but is not limited to use with the World Wide Web. Although all documents are referred to by their URL, a non-Web application can refer to local files using the file: scheme.

Scheme Handlers

When the loading of a document is requested, using the loaduri method shown above, a scheme handler is invoked to manage the transfer of data according to the protocol specified by the document's URL. An interface is provided to allow the application to extend the package with new scheme handlers, the document scheme command. Handlers for the file: and http: schemes are built-in, with the http: scheme handler supporting HTTP/1.1 [HREF 5] client access. Requests for the loading of documents are placed in a queue. There are three queues: high, normal and low priority. Documents loads are scheduled in priority order when the necessary resources become available. Resources may be restricted in various ways, for example there may be a limit on the number of open channels allowed, or loads from a particular document server may be pipelined over a persistent channel. The prioritising of document fetching allows the application to favour certain categories of documents over others, for example downloading an image map or an applet is more important than downloading the document background.

The system distinguishes between document load requests and the actual retrieval of the document data. This allows a document to be "loaded" concurrently in more than one request, but for the data to be fetched only once. For example, a HTML document could be displayed to the user who might then subsequently wish to save the document in a file before all of the document's data has been received. This request pattern is handled transparently by the Document Handling Package.

DHP Document Transfer
Figure 3: DHP Document Transfer

When the application makes a document request it must specify the purpose of the request - is the document to be saved to a file, or displayed to the user? This accomplished by giving a -target option to the document loaduri command. The value for -target may be file, variable or auto (for automatic media handlers, see below). In addition the option -targetid gives any necessary further information, such as the filename for a -target file argument, or the window name for a media handler which displays a document graphically. Following are some examples of how documents can be loaded for different purposes:


  document loaduri uri -target variable -targetid myDoc
  document loaduri uri -target file     -targetid /home/user/web/myDoc
  document loaduri uri -target auto     -targetid .app.www
The first command stores the document data in the Tcl variable myDoc. The second command stores the document data in the given file. The last command passes the data to an automatic media handler, and requests that the handler displays the document in the Tk window .app.www.

These options can be abbreviated. If the value for the -target option begins with "." then it is assumed to be a Tk window name and the document is passed to an automatic media handler. If the value for the -target option starts with a directory separator ("/" for Unix, "/" or "\" for Windows and ":" for Macintosh) then it assumed to be a filename and the document data is copied into that file. Hence the examples from above may be shortened to:


  document loaduri uri -target /home/user/web/myDoc
  document loaduri uri -target .app.www
In addition, the application may specify a -command option which gives a Tcl script to be evaluated in the same manner as an automatic media handler, see below. This option allows the application to perform customised processing of document data, or to "eavesdrop" on a data transfer. This may be especially useful when the media handler is supplied by a third-party.

Automatic Media Handlers

An application may register handlers for different media types using the document type handler command. When a document is loaded with an auto target, the handler which is registered to accept that document's media type is invoked and given the document's data. Certain handlers may declare that the value given by the -targetid argument is the Tcl command to invoke to process the document data. This is usually how Tk (mega-)widgets are configured that display documents to the user, see below.

In order to make creating automatic media handlers easier and to provide a flexible interface, the Document Handling Package defines an interface to media handlers that uses a method familiar to Tcl programmers. The media handler is evaluated at "interesting" stages of the document load process, and the command has certain arguments appended to it before being passed to the Tcl interpeter for evaluation. The scheme handler defines which stages of the load process are "interesting". The following arguments may be appended to the media handler command, along with arguments allowing access to document and load meta-data

begin
data transfer has commenced
data
document data is available, and is supplied as an additional argument
end
data transfer has completed successfully
progress
a milestone has been reached in the data transfer process
error
an error has occurred, and data transfer has been terminated
This interface may be extended with more methods in the future. The process of loading a document will thus cause a sequence of Tcl commands to be executed. For example, if an application wishes to load a document's data into a Tcl variable, then it would issue these commands:

  document type handler */*	;# accept all media types
  proc watchLoad {event args} {
    switch $event {
      end {
	# Variable "myVar" now has document data
      }
      default {# Could act upon other events too}
    }
  document loaduri uri -target variable -targetid myVar -command watchLoad
The follow commands will be executed as the data transfer occurs:

  watchLoad begin docstate loadstate
  watchLoad progress docstate loadstate {Connected to server}
  watchLoad data docstate loadstate <>
  watchLoad data docstate loadstate <>
  watchLoad end docstate loadstate

WWW Megawidget

A necessity for any World Wide Web browser is the ability to display HTML documents to the user. For the Plume browser this functionality has been packaged into a megawidget (a composite widget which behaves as a built-in widget). The WWW megawidget provides a high-level interface for the display of WWW documents, and several megawidgets are included for lower-level functions: a HTML megawidget, a progress meter and a bookmark megawidget. Support for XML documents will be added in the near future.

A design goal of this library, as with the Document Handling Package, is to provide an easy-to-use system that is highly flexible and customisable. Another goal is to be able to seamlessly interface the WWW megawidget with the Document Handling Package.

Megawidget Philosophy

When developing a megawidget of the Tk widget set, the aim is to make the programming interface to the megawidget as consistent as possible with that of a built-in Tk widget. Tk widgets have the following features:

WWW Megawidget Architecture

The WWW megawidget is itself composed of several megawidgets. These include the HTML megawidget, a progress meter and a bookmark widget. The WWW megawidget binds all of these lower-level widgets into a single interface for the application developer. It also manages the history stack for the browser.

Overview of WWW Megawidget Architecture

HTML Megawidget

The HTML megawidget is responsible for displaying HTML text to the user, and managing various aspects of the user interface, including link activation and HTML fill-out form interaction. The megawidget provides a table-driven HTML parser and a rendering engine supporting CSS1 style-sheets [HREF 6]. It also features context-sensitive editing of HTML text.

When a HTML megawidget is created a widget command is also created for the application to control the widget. For example, the Tcl command:


  html .app.www.html
Creates a new Tcl command .app.www.html, apart from also creating the widget itself. The widget command supports the common Tk widget methods, such as the configure method to change the widget's configuration options. It also has a number of methods to control the content of the widget: the HTML document. The widget command provides a HTML element level interface to the HTML document. The application uses the widget command's element method to retrieve or modify the elements at run-time. For example, to get the HTML text for the entire document, the application would issue the Tcl command:

  .app.www.html element get html
To get only elements that are in the document's <HEAD> section, the application would use the command:

  .app.www.html element get head

Generalised HTML Parser

Plume's HTML megawidget uses a Tcl-based, table-driven HTML parser [HREF 7] which has been modified to produce a heirarchical representation of the HTML document. An application may use the HTML parser directly, by calling the HTML:parse procedure. The parser generates a Tcl script which may be evaluated to cause procedures to be invoked to process the document, typically to display it to the user. This script may also be regarded as a parse tree for the HTML document.

There are three tables used by the parser to derive the document's structure. Firstly, a table listing whether an element is a "container" element or an empty element. Secondly, a table listing the content model for each container element. These two tables are derived directly from the HTML 3.2 DTD. Finally, a table is used to describe how to imply the existence of elements, given the context within which a start tag appears. This last table would not be necessary if all Web documents were strictly conforming to the HTML DTD, but HTML allows tags to be omitted where they are easily derived. For example, many Web documents omit the <HTML>, <HEAD> and <BODY> elements. The parse tree returned by the parser allows the application to manipulate the HTML document without having to be concerned with implied end tags, and so on.

Because Plume's HTML parser is completely table-driven, it is straight-forward to define new SGML elements. Some application developers find this feature attractive in order to be able to display arbitrary SGML documents, rather than having to create or generate HTML documents. This property is currently being exploited to develop support for the display of XML [HREF 8] documents.

Display Of HTML Documents Using CSS

The main purpose of the HTML megawidget is to display HTML documents to the user according to the presentation requested in an associated CSS stylesheet. The HTML megawidget provides a default stylesheet for rendering HTML v3.2 documents and utilitises a Tk Text widget for the purpose of display. The Tk Text widget provides many useful features, but also imposes some limitations.

All aspects of document presentation are controlled by Cascading Style Sheets (CSS). As with the HTML parser's DTD representation, the CSS implementation is table driven. This approach has the advantage of allowing new CSS properties to be defined. When a stylesheet is loaded, it is parsed and a table is created which is used during the display process. Cascaded stylesheets may be subsequently loaded, and their tables are merged together to form a final display table.

Advantages Of The Tk Text Widget For HTML Document Display

The advantages of using a Tk Text widget for displaying HTML/CSS documents include:

Disadvantages Of The Tk Text Widget For HTML Document Display

The disadvantage of using the Text widget is that it does not provide enough features to implement all of the presentation features necessary for rendering HTML v3.2 and CSS 1, although it is perfectly adequate for HTML v2.0. Missing features include text flow around images, letter and word spacing and background images. Future work may include modifying the Text widget implementation to support these functions.

Connection To The Document Handling Package

One of the most important aspects of both of the libraries described above is the ability to link them together so that documents can be retrieved from document servers and automatically displayed to the user. This is simple to achieve, due to the design of the connections between the Document Handling Package and the WWW Megawidget. The WWW megawidget automatically registers itself as a handler for text/html documents, as well as image/gif, image/x-portable-pixmap and image/x-bitmap documents (these are the image formats that Tk can display, and there are extensions which allow JPEG and TIFF image formats to be displayed). It provides the load method for handling document events. All that remains for the application programmer to do is to connect the two systems together. This is done with the commands:

  www .www	;# Creates a WWW megawidget called .www
  .www configure -loadcommand {document loaduri -target {.www load}}
The WWW megawidget's -loadcommand script is invoked whenever the megawidget requires a document to be loaded, for example when a hypertext anchor is activated. The value given for this option specifies the widget itself as the target of a document load.

Conclusion

The Document Handling Package and WWW Megawidget will be separated from the upcoming general release of the Plume Web browser to provide an embeddable WWW library for developers of stand-alone Web applications. The design goal of these libraries is to make accessing Web resources as simple as possible, and to relieve the programmer of many housekeeping tasks associated with handling and displaying of Web documents.

The Document Handling Package provides high-level management of Web documents, including their retrieval from remote document servers, local in-memory or on-disk caching and media type dependent processing.

The WWW Megawidget provides several utilities for parsing and displaying HTML documents. It also provides overall management of the loading and displaying of HTML documents.

Hypertext References

HREF 1
http://surfit.anu.edu.au/steve/ - Steve Ball's Home Page
HREF 2
http://surfit.anu.edu.au/ - Home Page for the SurfIt! WWW Browser
HREF 3
http://www.w3.org/ - Home Page for the WWW Consortium
HREF 4
http://www.sunlabs.com/research/tcl/ - The Tcl Home Page
HREF 5
http://www.w3.org/pub/WWW/TR/ - HTTP/1.1 Technical Report
HREF 6
http://www.w3.org/pub/WWW/Style/ - Home Page for Stylesheets
HREF 7
http://www.sunlabs.com/people/stephen.uhler/html_library/help.html - HTML library, Stephen Uhler, Sun Microsystems Laboratories.
HREF 8
http://www.w3.org/pub/WWW/TR/ - Extensible Markup Language (XML) Draft Specification

Copyright

Steve Ball ©, 1997. The authors assigns to Southern Cross University and other educational and non-profit institutions a non-exclusive licence to use this document for personal use and in courses of instruction provided that the article is used in full and this copyright statement is reproduced. The authors also grants a non-exclusive licence to Southern Cross University to publish this document in full on the World Wide Web and on CD-ROM and in printed form with the conference papers, and for the document to be published on mirrors on the World Wide Web. Any other usage is prohibited without the express permission of the authors.


[All Papers and Posters]


AusWeb97 Third Australian World Wide Web Conference, 5-9 July 1997, Southern Cross University, PO Box 157, Lismore NSW 2480, Australia Email: AusWeb97@scu.edu.au