Ryosuke HOTAKA, Institute for Socio-Economic Planning, 1-1-1 Tennodai, Tsukuba, Ibaraki 305, JAPAN, Phone +81 298 53 5087 Fax +81 298 53 5070. Email: hotaka@shako.sk.tsukuba.ac.jp Home Page: Hotaka Database Lab [HREF 1]
The Hypertext Transfer Protocol defined in HTTP: A Protocol for Networked Information [HREF 2] has become very popular on the Internet. HTTP combines platform independent hyper-text multimedia browsing using simple to use graphical client software with access to other protocols, such as FTP and GOPHER.
The same features that are attracting this massive group of general users - a Hypertext Markup Language (HTML) [HREF 3] with pre-defined graphical elements, computer platform independence and network navigation which is transparent to the user - also makes HTTP ideal for implementing database front-ends.
From a database developer's point of view, one difficult problem does however exist: Since HTTP is designed for interactive use it must accomplish its tasks reasonably fast, or else the user of the system can not interact with it. HTTP accomplishes this mainly by using an extremely simple transaction model (connection -> request -> response -> close) in combination with short time-outs (a typical WWW-server times out in around 1 minute).
We in this paper discuss a relational gateway program which bridges the simple transaction model of HTTP and the much more complex transaction model of an SQL database management system (DBMS). The gateway program does this by maintaining a separate session with the DBMS for each user.
The relational gateway program supports the following database functions:
The relational gateway program was implemented on the Macintosh platform in a visual programming environment called Prograph CPX (Prograph International Inc. 1993) and it conforms to The Common Gateway Interface [HREF 4] (CGI) specifications. We have connected the gateway to an Oracle DBMS using the Oracle Call Interface (Oracle Corp.) and to a Butler DBMS (EveryWare Development Corp. 1992) using Data Access Language (Apple Computer Inc. 1990).
The Butler compatible version of the gateway was demonstrated at the MacWorld Expo in Tokyo, February 1995, and it can be accessed anonymously from our home page.
The first integration attempt was made at CERN in 1992, where an Oracle gateway for SQL SELECT statements was written by Arthur Secret (T. Berners-Lee, R. Cailliau 1992). This gateway allowed a user to enter an SQL expression as a search text to a WWW-server. It formatted the results of the query and sent them back as plain text. Secret's work has been extended by Decoux at INRA, Paris [HREF 5], to support among other things queries using forms, HTML-connections between different rows and tables, as well as updates of data in pre-specified tables.
An interesting project is Integrating Structured Databases Into the Web: The MORE System [HREF 6], which, just like the system presented here, uses standard WWW-clients as the sole interface to the database, so that the user need not be aware of SQL or other protocols and languages. An interesting feature is that all interaction screens (or pages, to use WWW-terminology) except the first one are also stored in the database, and are dynamically generated at the user's request. A similar approach is taken in this paper, where the first page is the only page which is "hard-coded" in HTML.
Another project involved with the dynamic generation of interaction pages for WWW-clients is Zelig: Schema-Based Generation of Soft WWW Database Applications [HREF 7].
Furthermore, at the National Center for Supercomputing Applications, an HTTP-SQL gateway called GSQL: A Mosaic-SQL Gateway [HREF 8] is running in prototype. This gateway is connected to three different database systems, but only handles simple queries (i.e., not nested queries, inserts and deletes). This gateway also does not dynamically generate interface code, which means that interfaces must be written by hand for each and every database schema.
Presently, some advanced relational gateways are being developed for commercial use, such as the Oracle World Wide Web Interface Kit [HREF 9] . We have not yet had the opportunity to study those gateways, and they have for this reason not been included in this overview.
The hypertextual links are established using an address syntax called Uniform Resource Locators [HREF 10] (URL) which specifies protocol, server and document location.
The basic layout of any HTTP connection is the following:

Figure 1. Clients connect to a server and retrieve HTML-files.
However, HTTP allows for extensions to the server, by means of various HTML elements, which we will discuss in the next section.
A FORM is implemented as follows in HTML:
<FORM METHOD=GET | POST ACTION="URL">
<INPUT NAME="keyword_1"> <P>
<INPUT NAME="keyword_2"> <P>
*
*
*
<INPUT TYPE=SUBMIT VALUE="Submit your form"> <P>
<INPUT TYPE=RESET VALUE="Reset your form"> <P>
</FORM>
The ACTION keyword specifies the URL to which the form results will be sent: it is not just an HTML hypertext link to another document, but a link to an executable script or binary file (in our case the relational gateway program).
The METHOD keyword specifies the HTTP method to be used to submit the form information to the server. METHOD can have two arguments, GET and POST, which allow for passing parameters between the client and the server in various ways. We have chosen to work with the POST method, since it is the more sophisticated of the two.

Figure 3. Schematic view of the transaction protocol of a typical DBMS
It can easily be seen that only the simplest of queries can be performed within the limits of one single HTTP transaction.
The relational gateway program can accept multiple users at the same time and maintains user integrity by managing a session number for each user. The session number is provided by the DBMS and is unique for each user. Figure 4 shows an example where the program mediates between the two protocols.

Figure 4. The relational gateway mediates between the HTTP and SQL transaction protocols
The connection between the server and the browser is broken several times, but each time it is reestablished, the session number which is simply appended to the URL (along with the current page) allows the relational gateway to verify the user. Neither user name nor password is passed, so there is no additional security risk involved. Furthermore, if secure HTTP is used, session numbers can be passed over the network without security risks. Secure HTTP is currently only available with the Netscape family of servers and browsers [HREF 11] (using an approach called SSL), but many others are being developed.
Please note that sessions are not the same as transactions. In figure 4 above, two transactions with the DBMS occur in one session.
Our use of repeated queries is an attempt to somewhat capture the power of nested queries while at the same time maintaining a simple and clean graphical user interface. However, since we do not fully capture the expressional power of nested queries, more research is needed in this area.
Another difference between a nested query and repeated queries lies in performance. The extra work we have to perform is to explicitly save the results of each SELECT statement into a view (or a new table if the DBMS does not support views). Most systems avoid this, because of the overhead involved. However, because of the limitations in the current incarnation of the HTTP protocol, we actually have to perform this step anyway, to ensure that results do not get lost when time-out errors occur, as described in section 5.1. In other words, by providing necessary time-out error handling, we get a mechanism for providing repeated queries at no extra cost.
Each time a query is made, a temporary view is saved in the database under the name VIEWxxxxxx, where "xxxxxx" stands for the current session number. To use results of a previous query in a new query, we can simply join the VIEWxxxxxx table with other tables of the database using SELECT statements. To illustrate, we continue the example from figure 4.
In figure 4 we created a new table called '"suppliers", which has the columns "supp_num" and "supp_name". We then inserted a single tuple and made a SELECT. We thus have a temporary VIEWxxxxxx table, which can be accessed from all menus listing the tables of the current database, as in the following picture:

Figure 5. Tables in the current database
We then perform the following query:

Figure 6. A SELECT using a temporary VIEWxxxxxx
After we have joined the temporary VIEW131021 table and the "customer" table, the temporary VIEW131021 table is deleted and a new temporary table (with the same name, in this case VIEW131021), and the results (in this case only a single tuple) are automatically displayed on-screen:

Figure 7. On-screen display of results
This means that our gateway should contact the database and receive a reply which it can pass on to the server within that minute, or else there is no guarantee that a connection to the browser is still maintained by the WWW-server. For simple queries, one minute is certainly sufficient, but as the complexity of the commands sent to the DBMS increases, as well as the number of simultaneous users, the likelihood that some tasks will not be accomplished within 1 minute increases drastically.
For this reason, all query results are written to a temporary table with the name VIEWxxxxxx (where "xxxxxx" is the number of the current session) regardless of whether there is a connection established with the browser from which the query originated and the WWW-server or not.
At the time when the query completes, two scenarios are possible:
To make sure that data can be returned in all circumstances, even when loads are very high, the relational gateway also offers the option of returning results by mail instead of on-screen. The following screen in that case replaces figure 7:

Figure 8. Return of results by e-mail
After the command has been given, the user can log out or continue with other database tasks. That is, the relational gateway immediately returns control to the user instead of waiting for the query to complete.
When the time-out occurs, the gateway ends the session with the DBMS. If at that time, there is a VIEWxxxxxx table corresponding to the session, it will be erased. Eventual mail results are always e-mailed to the user before the session is terminated.
Luckily, the HTTP standard allows for extensions to the server with various gateways (CGIs). In this paper, we have presented such a gateway which by establishing its own sessions with the database server can mediate between the single transaction HTTP protocol and the protocol of the database server, which inherently uses multiple transactions.
Our relational gateway program supports insertion and deletion of tables and columns, addition and deletion of data, and simple as well as repeated queries.
Furthermore, to ensure delivery of query results, temporary VIEWxxxxxx tables as well as a mail reply option have been implemented. The gateway also automatically logs out users after a time period which can be specified by the system administrator.
The user's password is only passed through the gateway once, when the user logs in. However, since the password at that time is passed openly, some form of secure HTTP must be used or otherwise a security risk is incurred. However, once the first connection has been established, only session numbers are passed back and forth and no additional security risk is incurred.
To be able to provide more complex queries and not only simple queries currently offered by most relational gateways, we allow the user to repeatedly query the database using intermediate results. This approach does however not capture the full expressional power of nested queries in SQL, and more research should be devoted to developing more powerful HTML-pages for this.
Further on, more work needs to be done on integration between HTML and various database call level interfaces. Performance measurements have not been done in this paper.
The relational gateway presented in this paper only allows for management of simple data types. A potential problem with HTML in its current incarnation is its one-way limitations in handling multimedia: The user can receive various forms of data but the only user inputs allowed except for specifications of links are text-based. If HTML is going to be used as the sole interface for a multimedia database, there have to be some provisions for the input of pictures, sound etc., or retrieval of multimedia links specified as inputs through WWW-browsers.
T. Berners-Lee, R. Cailliau (1992) "World-Wide Web" Computing in High Energy Physics 92, Anney, France.
T. Berners-Lee, R. Cailliau, N. Pellow, A. Secret (1993) "The World-Wide Web Initiative" Proc. INET '93, Internet Society, San Francisco, USA.
"Butler User's Guide", EveryWare Development Corporation, Canada, 1992.
F. Halasz, M. Schwarz (1994) "The Dexter Hypertext Reference Model" Communications of the ACM, Vol. 37, No. 2, 1994.
"Oracle for Macintosh User's Guide", Oracle Corp., USA
"Prograph CPX ABC Reference" Prograph International Inc., Canada, 1993.
AusWeb95 The First Australian WorldWideWeb Conference