A WWW Gateway for Interactive Relational Database Management


Michael BJÖRN, Doctoral Program of Socio-Economic Planning, 1-1-1 Tennodai, Tsukuba, Ibaraki 305, JAPAN, Phone +81 298 53 5424 Fax +81 298 53 5070. Email: michael@wiz.sk.tsukuba.ac.jp Home Page: Hotaka Database Lab [HREF 1]

Ryosuke HOTAKA, Institute for Socio-Economic Planning, 1-1-1 Tennodai, Tsukuba, Ibaraki 305, JAPAN, Phone +81 298 53 5087 Fax +81 298 53 5070. Email: hotaka@shako.sk.tsukuba.ac.jp Home Page: Hotaka Database Lab [HREF 1]


Keywords: WWW, Relational Gateways, Common Gateway Interface, HTTP.

Introduction

The Hypertext Transfer Protocol defined in HTTP: A Protocol for Networked Information [HREF 2] has become very popular on the Internet. HTTP combines platform independent hyper-text multimedia browsing using simple to use graphical client software with access to other protocols, such as FTP and GOPHER.

The same features that are attracting this massive group of general users - a Hypertext Markup Language (HTML) [HREF 3] with pre-defined graphical elements, computer platform independence and network navigation which is transparent to the user - also makes HTTP ideal for implementing database front-ends.

From a database developer's point of view, one difficult problem does however exist: Since HTTP is designed for interactive use it must accomplish its tasks reasonably fast, or else the user of the system can not interact with it. HTTP accomplishes this mainly by using an extremely simple transaction model (connection -> request -> response -> close) in combination with short time-outs (a typical WWW-server times out in around 1 minute).

We in this paper discuss a relational gateway program which bridges the simple transaction model of HTTP and the much more complex transaction model of an SQL database management system (DBMS). The gateway program does this by maintaining a separate session with the DBMS for each user.

Features of the Relational Gateway Program

The relational gateway program discussed in this paper allows browsing of any database available to the DBMS. It accomplishes this by generating interface code dynamically from data-dictionary information for any database, table or row.

The relational gateway program supports the following database functions:

The relational gateway program was implemented on the Macintosh platform in a visual programming environment called Prograph CPX (Prograph International Inc. 1993) and it conforms to The Common Gateway Interface [HREF 4] (CGI) specifications. We have connected the gateway to an Oracle DBMS using the Oracle Call Interface (Oracle Corp.) and to a Butler DBMS (EveryWare Development Corp. 1992) using Data Access Language (Apple Computer Inc. 1990).

The Butler compatible version of the gateway was demonstrated at the MacWorld Expo in Tokyo, February 1995, and it can be accessed anonymously from our home page.

Related Work

So far, there have not been many papers published about integration between relational databases and HTTP. At the same time there is a lot of commercial activity in the field and quite many relational gateways are already configured and running on the WWW. What most of these gateways have in common is that they focus on retrieval of data using simple SELECT statements. Instead of discussing all those gateways here, we will briefly present some attempts at providing more feature rich access to databases on the WWW.

The first integration attempt was made at CERN in 1992, where an Oracle gateway for SQL SELECT statements was written by Arthur Secret (T. Berners-Lee, R. Cailliau 1992). This gateway allowed a user to enter an SQL expression as a search text to a WWW-server. It formatted the results of the query and sent them back as plain text. Secret's work has been extended by Decoux at INRA, Paris [HREF 5], to support among other things queries using forms, HTML-connections between different rows and tables, as well as updates of data in pre-specified tables.

An interesting project is Integrating Structured Databases Into the Web: The MORE System [HREF 6], which, just like the system presented here, uses standard WWW-clients as the sole interface to the database, so that the user need not be aware of SQL or other protocols and languages. An interesting feature is that all interaction screens (or pages, to use WWW-terminology) except the first one are also stored in the database, and are dynamically generated at the user's request. A similar approach is taken in this paper, where the first page is the only page which is "hard-coded" in HTML.

Another project involved with the dynamic generation of interaction pages for WWW-clients is Zelig: Schema-Based Generation of Soft WWW Database Applications [HREF 7].

Furthermore, at the National Center for Supercomputing Applications, an HTTP-SQL gateway called GSQL: A Mosaic-SQL Gateway [HREF 8] is running in prototype. This gateway is connected to three different database systems, but only handles simple queries (i.e., not nested queries, inserts and deletes). This gateway also does not dynamically generate interface code, which means that interfaces must be written by hand for each and every database schema.

Presently, some advanced relational gateways are being developed for commercial use, such as the Oracle World Wide Web Interface Kit [HREF 9] . We have not yet had the opportunity to study those gateways, and they have for this reason not been included in this overview.

WWW, HTTP and HTML

The World-Wide Web (WWW) is a wide area information retrieval project which was started in 1989 at the CERN European Laboratory for Particle Physics (T. Berners-Lee, R. Cailliau, N. Pellow, A. Secret 1993). What makes WWW unique is that it makes hypermedia, as defined by the Dexter Model (F. Halasz, M. Schwarz 1994), available on the Internet in what has evolved into a global information system. This is accomplished through HTTP, which was designed for platform independent communication between HTTP-aware clients and servers.

The hypertextual links are established using an address syntax called Uniform Resource Locators [HREF 10] (URL) which specifies protocol, server and document location.

The basic layout of any HTTP connection is the following:

Figure 1. Clients connect to a server and retrieve HTML-files.

However, HTTP allows for extensions to the server, by means of various HTML elements, which we will discuss in the next section.

HTML as a Design Language for Database Interface Pages

The basic HTML construct that we have used is the FORM statement. Forms in HTML are like those paper forms that we all have been filling out since we learned how to write; application forms, request forms, business form, etc. Basically a FORM contains some fields where the user is supposed to enter data in a structured way, a button to submit the form (which simulates submitting a form to an office or similar in the real world), and a button to erase the user's input so that the form can be used again.

A FORM is implemented as follows in HTML:

     <FORM METHOD=GET | POST     ACTION="URL">

     <INPUT NAME="keyword_1"> <P>

     <INPUT NAME="keyword_2"> <P>

          *

          *

          *

     <INPUT TYPE=SUBMIT     VALUE="Submit your form"> <P>

     <INPUT TYPE=RESET     VALUE="Reset your form"> <P>

     </FORM>

The ACTION keyword specifies the URL to which the form results will be sent: it is not just an HTML hypertext link to another document, but a link to an executable script or binary file (in our case the relational gateway program).

The METHOD keyword specifies the HTTP method to be used to submit the form information to the server. METHOD can have two arguments, GET and POST, which allow for passing parameters between the client and the server in various ways. We have chosen to work with the POST method, since it is the more sophisticated of the two.

Transactions in HTTP and in SQL

As was mentioned in the introduction, the transaction protocols in HTTP and in SQL are quite different. The main purpose of using the relational gateway program is to mediate between the two protocols, but before we look at the gateway itself, we briefly examine the two protocols.

HTTP transaction protocol

The HTTP transaction protocol consists of 4 steps:
1. Connection
The establishment of a connection by the client to the server - when using TCP/IP, port 80 is the default port, but other non-reserved ports may be specified in the URL.
2. Request
The sending, by the client, of a request message to the server.
3. Response
The sending, by the server, of a response to the client.
4. Close
The closing of the connection by client or server.
This protocol implements a single transaction consisting of a single request and response pair in a session with a WWW-server. Most database gateway implementations that we have seen so far use this single transaction for database access as well. A session using such a protocol has the following structure:

Figure 3. Schematic view of the transaction protocol of a typical DBMS

It can easily be seen that only the simplest of queries can be performed within the limits of one single HTTP transaction.

The Role of the Relational Gateway Program

The most general role of the relational gateway program is to establish a connection between the client and the DBMS, so that information can be passed in both directions. However, when there is mismatch between the HTTP transaction protocol and the SQL transaction protocol, the gateway also has to mediate between the protocols, so that consistency is maintained on both sides.

The relational gateway program can accept multiple users at the same time and maintains user integrity by managing a session number for each user. The session number is provided by the DBMS and is unique for each user. Figure 4 shows an example where the program mediates between the two protocols.

Figure 4. The relational gateway mediates between the HTTP and SQL transaction protocols

The connection between the server and the browser is broken several times, but each time it is reestablished, the session number which is simply appended to the URL (along with the current page) allows the relational gateway to verify the user. Neither user name nor password is passed, so there is no additional security risk involved. Furthermore, if secure HTTP is used, session numbers can be passed over the network without security risks. Secure HTTP is currently only available with the Netscape family of servers and browsers [HREF 11] (using an approach called SSL), but many others are being developed.

Please note that sessions are not the same as transactions. In figure 4 above, two transactions with the DBMS occur in one session.

Repeated Queries

With a nested query we mean that one (or more) SELECT statement(s) occur within another SELECT statement, as defined in SQL. This form of querying is sometimes needed to produce desired results. To build a graphically oriented interface which handles nesting of SELECT statements is however not an easy task. In this paper we have taken a simpler approach; we allow the user to repeatedly query the database using intermediate results or combinations of intermediate results with other parts of the database. This is possible because theoretically all results of a SELECT statement produces a new table which in turn can be queried, and so on.

Our use of repeated queries is an attempt to somewhat capture the power of nested queries while at the same time maintaining a simple and clean graphical user interface. However, since we do not fully capture the expressional power of nested queries, more research is needed in this area.

Another difference between a nested query and repeated queries lies in performance. The extra work we have to perform is to explicitly save the results of each SELECT statement into a view (or a new table if the DBMS does not support views). Most systems avoid this, because of the overhead involved. However, because of the limitations in the current incarnation of the HTTP protocol, we actually have to perform this step anyway, to ensure that results do not get lost when time-out errors occur, as described in section 5.1. In other words, by providing necessary time-out error handling, we get a mechanism for providing repeated queries at no extra cost.

Each time a query is made, a temporary view is saved in the database under the name VIEWxxxxxx, where "xxxxxx" stands for the current session number. To use results of a previous query in a new query, we can simply join the VIEWxxxxxx table with other tables of the database using SELECT statements. To illustrate, we continue the example from figure 4.

In figure 4 we created a new table called '"suppliers", which has the columns "supp_num" and "supp_name". We then inserted a single tuple and made a SELECT. We thus have a temporary VIEWxxxxxx table, which can be accessed from all menus listing the tables of the current database, as in the following picture:

Figure 5. Tables in the current database

We then perform the following query:

Figure 6. A SELECT using a temporary VIEWxxxxxx

After we have joined the temporary VIEW131021 table and the "customer" table, the temporary VIEW131021 table is deleted and a new temporary table (with the same name, in this case VIEW131021), and the results (in this case only a single tuple) are automatically displayed on-screen:

Figure 7. On-screen display of results

Time-Out Issues

There are two time-out issues that we have to deal with, one for each protocol involved: The HTTP time-out problem stems from limitations inherent to the HTTP protocol, whereas the SQL server time-out is a "garbage cleaning" feature we must provide since there is a possibility that the user is not on-line any longer to terminate his connection with the DBMS.

HTTP Time-Out Problems

The time-out period for the HTTP protocol depends on the settings for the WWW-server, but as mentioned in the introduction of this paper, a time-out much longer than 1 minute destroys the illusion of interactivity which is very important to the user.

This means that our gateway should contact the database and receive a reply which it can pass on to the server within that minute, or else there is no guarantee that a connection to the browser is still maintained by the WWW-server. For simple queries, one minute is certainly sufficient, but as the complexity of the commands sent to the DBMS increases, as well as the number of simultaneous users, the likelihood that some tasks will not be accomplished within 1 minute increases drastically.

For this reason, all query results are written to a temporary table with the name VIEWxxxxxx (where "xxxxxx" is the number of the current session) regardless of whether there is a connection established with the browser from which the query originated and the WWW-server or not.

At the time when the query completes, two scenarios are possible:

  1. The connection to the browser has not timed out. In this case the result is passed back through the gateway to the user's computer screen.

  2. The connection to the browser has timed out. In this case the result is available in the VIEWxxxxxx table and all the user has to do is to go back to the previous screen of the browser and SELECT all columns of the VIEWxxxxxx table. If the user does not reconnect, the gateway automatically finishes the session with the DBMS as described below.

To make sure that data can be returned in all circumstances, even when loads are very high, the relational gateway also offers the option of returning results by mail instead of on-screen. The following screen in that case replaces figure 7:

Figure 8. Return of results by e-mail

After the command has been given, the user can log out or continue with other database tasks. That is, the relational gateway immediately returns control to the user instead of waiting for the query to complete.

SQL Server Connection Time-Out

The relational gateway program establishes and keeps a session with the DBMS independent of whether a user is currently connected to the WWW-server or not. However, the gateway must have an ability to end the session even if the user does not connect again. The relational gateway allows the system administrator to set a connection time-out period between 1 and 999 minutes.

When the time-out occurs, the gateway ends the session with the DBMS. If at that time, there is a VIEWxxxxxx table corresponding to the session, it will be erased. Eventual mail results are always e-mailed to the user before the session is terminated.

Conclusions and Further Study

We conclude that the HTTP protocol will be extremely important in future information systems. HTML is a surprisingly powerful language for building platform independent database interfaces, using mainly the FORM construct. However, the single transaction nature of the HTTP protocol only allows for very simple database queries.

Luckily, the HTTP standard allows for extensions to the server with various gateways (CGIs). In this paper, we have presented such a gateway which by establishing its own sessions with the database server can mediate between the single transaction HTTP protocol and the protocol of the database server, which inherently uses multiple transactions.

Our relational gateway program supports insertion and deletion of tables and columns, addition and deletion of data, and simple as well as repeated queries.

Furthermore, to ensure delivery of query results, temporary VIEWxxxxxx tables as well as a mail reply option have been implemented. The gateway also automatically logs out users after a time period which can be specified by the system administrator.

The user's password is only passed through the gateway once, when the user logs in. However, since the password at that time is passed openly, some form of secure HTTP must be used or otherwise a security risk is incurred. However, once the first connection has been established, only session numbers are passed back and forth and no additional security risk is incurred.

To be able to provide more complex queries and not only simple queries currently offered by most relational gateways, we allow the user to repeatedly query the database using intermediate results. This approach does however not capture the full expressional power of nested queries in SQL, and more research should be devoted to developing more powerful HTML-pages for this.

Further on, more work needs to be done on integration between HTML and various database call level interfaces. Performance measurements have not been done in this paper.

The relational gateway presented in this paper only allows for management of simple data types. A potential problem with HTML in its current incarnation is its one-way limitations in handling multimedia: The user can receive various forms of data but the only user inputs allowed except for specifications of links are text-based. If HTML is going to be used as the sole interface for a multimedia database, there have to be some provisions for the input of pictures, sound etc., or retrieval of multimedia links specified as inputs through WWW-browsers.

References

"Data Access Language Programmer's Reference", Apple Computer Inc., USA, 1990.

T. Berners-Lee, R. Cailliau (1992) "World-Wide Web" Computing in High Energy Physics 92, Anney, France.

T. Berners-Lee, R. Cailliau, N. Pellow, A. Secret (1993) "The World-Wide Web Initiative" Proc. INET '93, Internet Society, San Francisco, USA.

"Butler User's Guide", EveryWare Development Corporation, Canada, 1992.

F. Halasz, M. Schwarz (1994) "The Dexter Hypertext Reference Model" Communications of the ACM, Vol. 37, No. 2, 1994.

"Oracle for Macintosh User's Guide", Oracle Corp., USA

"Prograph CPX ABC Reference" Prograph International Inc., Canada, 1993.

Hypertext References

HREF 1
http://palaver.sk.tsukuba.ac.jp/ - The Hotaka Database Lab Home Page.
HREF 2
http://info.cern.ch/hypertext/WWW/Protocols/HTTP/HTTP2.html - T. Berners-Lee (1994) "HTTP: A Protocol for Networked Information" CERN, IEFT Internet Draft, original version 1991.
HREF 3
http://info.cern.ch/hypertext/WWW/MarkUp/HTML.html - T. Berners-Lee (1994) "Hypertext Markup Language (HTML)", version 1.2, IEFT Internet Draft.
HREF 4
http://hoohoo.ncsa.uiuc.edu/cgi/ - R. McCool(1993) "The Common Gateway Interface" NCSA, University of Illinois at Urbana-Champaign, updated 1994.
HREF 5
http://moulon.inra.fr/oracle/www_oracle_eng.html - English version of the page describing Decoux' extension of Sectret's work, INRA, Paris.
HREF 6
http://www.elsevier.nl/cgi-bin/WWW94link/01/overview - D. Eichmann, T. McGregor, D. Danley (1994) "Integrating Structured Databases Into the Web: The MORE System" Proc. of the First international Conference on the World-Wide Web, CERN, Geneva, Switzerland
HREF 7
http://www1.cern.ch/PapersWWW94/cvarel.ps - C. Varela, C. Hayes (1994) "Zelig: Schema-Based Generation of Soft WWW Database Applications", Proc. of the First International Conference on the World-Wide Web, CERN, Geneva, Switzerland
HREF 8
http://www.ncsa.uiuc.edu:80/SDG/People/jason/pub/gsql/starthere.html - J. Ng. (1993) "GSQL: A Mosaic-SQL Gateway" NCSA, University of Illinois at Urbana-Champaign
HREF 9
http://dozer.us.oracle.com:8080/ - Home page for the Oracle World Wide Web Interface Kit.
HREF 10
http://info.cern.ch/hypertext/WWW/Addressing/URL/Overview.html - T. Berners-Lee(1994) "Uniform Resource Locators" IEFT Internet Draft
HREF 11
http://home.netscape.com/home/welcome.html - Homepage for Netscape Communications Corp.

Copyright

© Southern Cross University, 1994. Permission is hereby granted to use this document for personal use and in courses of instruction at educational institutions provided that the article is used in full and this copyright statement is reproduced. Permission is also given to mirror this document on WorldWideWeb servers. Any other usage is expressly prohibited without the express permission of Southern Cross University.
Return to the AusWeb95 Table of Contents

AusWeb95 The First Australian WorldWideWeb Conference