The Clustered Web Server - A Management Approach to Server Design


Associate Professor R.W. (Bob) Ward, Head of Information Systems, School of Mathematical & Physical Sciences, Murdoch University, South Street, Murdoch, WA 6150, AUSTRALIA. Phone +619 360 2828 Fax: +619 332 4677. Email: R.W.Ward@is.murdoch.edu.au
Keywords: access, cost, database, performance, proxy, management, security

Introduction

Since the initial inception of the World-Wide Web, considerable effort has been focused on establishing content and increasing accessibility of the web itself through various enhanced client interfaces. Now that the World-Wide Web is firmly in place and still rapidly growing, we find that attention to issues such as access, cost, integrity, levels of service and general web management factors is now urgently required.

This paper briefly examines those aspects of the World-Wide Web from which most management issues arise, and proposes a series of requirements which an "ideal" World-Wide Web server must address in order to provide an acceptable level of management capabilities. The paper concludes with a proposed system structure which addresses those requirements and reports on the status of a project to construct such a server.

Background

Ease of Use Conceals Network

Since its first inception at CERN in 1990, the World-Wide Web, its dramatic growth in utilisation has been a matter of common knowledge. From the user perspective, the services provided by the many client software applications (e.g. Available by FTP are Mosaic, Netscape, Cello, WinWeb to name a few) have been generally very good and the complexity of the linking across the various servers in the Internet has been largely concealed. With the removal of the need to know about the Internet and its implementation has also come a lack of concern - by users, of the network management difficulties associated with heavy network use.

February 1995 NSF Internet Utilisation

Introduction of Charging

In Australia, until recently, utilisation of the Internet as part of AARNET via university and other academic access points has been on the basis of an annual connection fee with no additional charges for traffic volume. This fortunate situation has enabled the rapid acceptance and growth of the World-Wide Web, but is destined to end very soon.

The operation of the AARNET is to be managed by a national corporation on a fee-paying basis which will include charges for international traffic.

Number of Users

With the massive publicity being given to the World-Wide Web, the demand for access from non-traditional areas is growing faster than can be accomodated. Universities students see access as a "right" rather than a priviledge and pressure to provide access points and facilities grows daily. Newspaper articles, televison advertising (by computer vendors) and computer magazines have created a market demand from private citizens. National governments have established information policies including the use of the Internet as a key component of a so-called "Information Super Highway". The recent (March,1995) Danish Government statement to parliament on "Info-Society 2000" is a good example of these new directions.

Resource Management Issues

With growth of this nature, the underlying network structure is clearly destined for very heavy loads. There are rumours which suggest that, at present rate of growth, the Internet will reach the limit of its capacity by mid 1996.

While there is a basis for these rumours, it must be remembered that the Internet is not simply one network, but an inter-network of many networks. Each of these networks has its own traffic and capacity capabilities.

Because of this "segmented" structure, the Internet is likely to develop "bottlenecks" at key transfer (between networks) points rather than to degrade as a single entity.

None-the-less, there are significant issues associated with allowing the traffic load to grow without some consideration of the implications. The response of Internet segment managers to "thru" traffic overloading their network segments and possibly provoking the need for costly upgrades is difficult to predict.

Referential Integrity Problems

Together with the growth in number of users has been an even more impressive growth in active World-Wide Web servers. Various estimates during the period 1993-1995 indicate the number of known servers at 62 in 1993, 829 in 1994 and several thousand in 1995. This is due to the fact that the software tools and necessary guidelines are freely available (e.g. CERN Guidelines) and, owing to the ease with which a server can be established, many users of the World-Wide Web are changing to the role of Web providers.

This is, in its self, probably a healthy development, but it brings its own share of unique problems.

The World-Wide Web is based on a hypertext construction model, with links between documents defined using the well known URL for documents which may reside remotely anywhere in the Internet.

The rate of development of the World-Wide Web and level of change within individual servers frequently leads to the situation where a document previously linked to by a neighbouring site may cease to exist at any given time owing to the relative autonomy of the server management.

This leads to a referential integrity problem well understood in the area of Distributed Database, but not well understood by many "Web Masters". Clearly the more complex the web connections, the more vulnerable to this form of problem the overall World-Wide Web becomes.

Cluster Server Requirements

Cluster Server Defined

The Cluster Server is a World-Wide Web server which provides extended services to a defined user population. Since this population is frequently a group of related computers on a network, the computers are termed part of a cluster. The server still provides the standard facilities to users not within the cluster, but may restrict the services based on operational or management criteria. Examples of "extended services" are discussed later in this section. An important distinction in the design of the cluster server is that it includes "control points" at which additional features might be added (or existing services restricted) dependent on the local implementation or run-time requirements.

What follows is a background and a small part of the requirements definition to the Cluster Server, addressing those aspects with which the reader may be most familiar.

Previous Work

Early efforts to build server concentrated on dealing with the initial problems associated with overcoming machine/system-dependencies and the technical problems associated with actually creating the web itself.

These efforts were based largely on the use of the UNIX file structure as a storage medium and the use of BSD Sockets as the communications basis. UNIX, of course, being the prevalent operating system on the Internet at that time. More recently, we see a trend toward the implementation of servers on smaller, cheaper "platforms" including MacIntosh, Windows NT and DOS/Windows combinations.

In virtually all these cases the control and management of the HTML documents and associated multi-media has been the responsibility of the peron managing the server and the file system which is provided with the underlying operating system. This can lead to administrative difficulties similar to those experienced in large time-sharing environments where many users require space to store small files etc..

Other problems derive from the desire to either security requirements, optimisation of internet traffic. Initiatives seeking to address these problems include the Common Gateway Interface (CGI) initiative and the "proxy" server. These are largely based on the underlying file system as before but provide some form of "caching" of frequently retrieved documents and provide a control point to preclude the need to allow open access to the Internet.

When we begin to consider the integrity of the data within the Web itself, some of these problems become considerable. Problems related to the caching and "ageing" of files across users, integrity management, off-peak optimisation of services, security access, data compression/encryption are real and require attentions.

It is asserted that the fundamental, file-oriented, structure on which the existing web is based does not readily lend itself to the solution of these problems.

Control of Services

Whenever the word "control" is used in the context of a public information service, issues of censorship and "big brother" are frequently raised. There may be some validity in these arguments, but it depends entirely on the organisation who administers the controls and not the controls themselves.

The Cluster Server requires facilities which enable either, users to have access to selected sites or to exclude access to selected sites. This type off access control may be enacted to control overseas costs, to exclude access to sites who request exclusion, or to exclude external (to the cluster) access for the purposes of Internet access control. At all stages the inclusion/exclusion is based on access rights prescribed within a cluster.

Resource Accounting

Another group of commonly sought controls includes account CPU, transmission and disk storage allocations as are frequently found in mainframe environments. These, although largely self-evident in nature, are provided through an "open systems" exit facility. Through this, the management of a given site can add their own desired "routines" to implement the controls.

Integrity Management

This is a large area of function within the Cluster Server. The objective is to establish levels of web integrity ( e.g. no undetected missing links) which ae close to that experienced within the Distributed Database Environment. Suffice to say the the server utilises a databse for document and multi-media storage rather than a file system. In this way the management of the documents and their associated links - especially to other "trusted" Cluster Servers, provides a considerable number of options for synchronising servers during off-peak periods, automatic ageing of documents according to either HTML rules or date/time of last reference off-peak link validation automatic link establishment (c.f. the World-Wide Web Worm) and storage compression/extraction facilities.

The Cluster Server Project at Murdoch

Objectives

Established in January 1995, the project is been funded by a Murdoch University grant and seeks to create a prototype Server based on the Distributed Database Environment as proposed above. Results will be published from time to time, and will be available on the Cluster Server itself in due course.

It is not expected that a full implementation will be produced, rather, that a basis for the future extension and validation of the model be produced. It is expected that the project will uncover new requirements and will probably change some of the existing web requirements during its execution.

Conclusions

Many of the principles in the Cluster Server are the result of the application of Distributed Database and mainframe resource management principles. There is little new technology involved at all. In this respect, the project takes few risks in its execution. It is believed that the combination of principles, which were arrived at by taking a fresh look at where the World-Wide Web is going, will produce a more stable platform on which to develop "maintainable" web information to meet the growing investment of knowledge in the Internet.

References

Berners-Lee, T., Internet Engineering Task Force, (1993) "Hypertext Transfer Protocol, Draft Internet Standard"

Berners-Lee, T., Connolly, D., Muldrow, K., Internet Engineering Task Force, (1994) "Hypertext Markup Language (HTML), Version 2.0 Draft Internet Standard"

Berners-Lee, T., Cailliau, R., et al. Communications of the ACM, Vol 37, No 8 (1994) "The World Wide Web"

van Duuren, J., et al. Addison-Wesley (1993), "Telecommunications Networks and Services"

Kauffels, F-J., Addison-Wesley (1992), "Network Management, Problems, Standards and Strategies"

Hypertext References

HREF 1
http://info.cern.ch/hypertext/WWW/TheProject.html - Home page for the WWW project at CERN.
HREF 2
ftp://ftp.merit.edu/statistics/nsfnet - Statistics for 1995 NSF Internet traffic
HREF 3
http://www.sdn.dk/fsk/actionplan/ - Danish Government "Info-Society 2000" Statement to Parliament March 1995
HREF 4
http://info.cern.ch/hypertext/WWW/Daemon/Overview.html - CERN web server guidelines

Copyright

© Southern Cross University, 1995. Permission is hereby granted to use this document for personal use and in courses of instruction at educational institutions provided that the article is used in full and this copyright statement is reproduced. Permission is also given to mirror this document on WorldWideWeb servers. Any other usage is expressly prohibited without the express permission of Southern Cross University.
Return to the AusWeb95 Table of Contents

AusWeb95 The First Australian WorldWideWeb Conference