The Clustered Web Server - A Management Approach to Server Design
Associate Professor R.W. (Bob) Ward, Head of Information Systems,
School of Mathematical & Physical Sciences, Murdoch University, South Street,
Murdoch, WA 6150, AUSTRALIA. Phone +619 360 2828 Fax: +619 332 4677.
Email: R.W.Ward@is.murdoch.edu.au
Keywords: access, cost, database, performance, proxy, management, security
Introduction
Since the initial inception of the World-Wide Web, considerable effort has been
focused on establishing content and increasing accessibility of the web itself through various
enhanced client interfaces. Now that the World-Wide Web is firmly in place and still rapidly growing, we
find that attention to issues such as access, cost, integrity, levels of service and general web
management factors is now urgently required.
This paper briefly examines those aspects of the
World-Wide Web from which most management issues arise, and proposes a series of requirements
which an "ideal" World-Wide Web server must address in order to provide an acceptable level of
management capabilities. The paper concludes with a proposed system structure which
addresses those requirements and reports on the status of a project to construct such a server.
Background
Ease of Use Conceals Network
Since its first inception at CERN in 1990, the World-Wide Web, its dramatic growth in utilisation has been a
matter of common knowledge. From the user perspective, the services provided by the many client software
applications (e.g. Available by FTP are
Mosaic,
Netscape,
Cello,
WinWeb
to name a few) have been generally very good and the complexity of the
linking across the various servers in the Internet has been largely concealed. With the removal of the need to
know about the Internet and its implementation has also come a lack of concern - by users, of the
network management difficulties associated with heavy network use.
February 1995 NSF Internet Utilisation
Introduction of Charging
In Australia, until recently, utilisation of the Internet as part of AARNET via university
and other academic access points has been on the basis of an annual connection fee
with no additional charges for traffic volume. This fortunate situation has enabled the rapid
acceptance and growth of the World-Wide Web, but is destined to end very soon.
The operation of the AARNET is to be managed by a national corporation on a fee-paying basis which
will include charges for international traffic.
Number of Users
With the massive publicity being given to the World-Wide Web, the demand for access from non-traditional areas
is growing faster than can be accomodated. Universities students see access as a "right" rather than a priviledge
and pressure to provide access points and facilities grows daily. Newspaper articles, televison advertising (by
computer vendors) and computer magazines have created a market demand from private citizens. National governments
have established information policies including the use of the Internet as a key component of a so-called
"Information Super Highway". The recent (March,1995) Danish Government statement to parliament
on "Info-Society 2000" is a good example of these new directions.
Resource Management Issues
With growth of this nature, the underlying network structure is clearly destined for very heavy loads. There are
rumours which suggest that, at present rate of growth, the Internet will reach the limit of its capacity by mid 1996.
While there is a basis for these rumours, it must be remembered that the Internet is not simply one network, but
an inter-network of many networks. Each of these networks has its own traffic and capacity capabilities.
Because of this "segmented" structure, the Internet is likely to develop "bottlenecks" at key transfer (between
networks) points rather than to degrade as a single entity.
None-the-less, there are significant issues associated with allowing the traffic load to grow without some
consideration of the implications. The response of Internet segment managers to "thru" traffic overloading
their network segments and possibly provoking the need for costly upgrades is difficult to predict.
Referential Integrity Problems
Together with the growth in number of users has been an even more impressive growth in active World-Wide Web
servers. Various estimates during the period 1993-1995 indicate the number of known servers at 62
in 1993, 829 in 1994 and several thousand in 1995.
This is due to the fact that the software tools and necessary guidelines are freely available
(e.g. CERN Guidelines) and, owing to the ease
with which a server can be established, many users of the World-Wide Web are changing to the role
of Web providers.
This is, in its self, probably a healthy development, but it brings its own share of unique problems.
The World-Wide Web is based on a hypertext construction model, with links between documents defined using the
well known URL for documents which may reside remotely anywhere in the Internet.
The rate of development of the World-Wide Web and level of change within individual servers frequently leads
to the situation where a document previously linked to by a neighbouring site may cease to exist at any given
time owing to the relative autonomy of the server management.
This leads to a referential integrity problem well understood in the area of Distributed Database, but not well understood
by many "Web Masters". Clearly the more complex the web connections, the more vulnerable to this form of problem
the overall World-Wide Web becomes.
Cluster Server Requirements
Cluster Server Defined
The Cluster Server is a World-Wide Web server which provides extended services to a defined user population.
Since this population is frequently a group of related computers on a network, the computers are termed part of a cluster.
The server still provides the standard facilities to users not within the cluster, but may restrict the services based on
operational or management criteria.
Examples of "extended services" are discussed later in this section. An important distinction in the design of the
cluster server is that it includes "control points" at which additional features might be added (or existing services
restricted) dependent on the local implementation or run-time requirements.
What follows is a background and a small part of the requirements definition to the Cluster Server,
addressing those aspects with which the reader may be most familiar.
Previous Work
Early efforts to build server concentrated on dealing with the initial problems associated with overcoming
machine/system-dependencies and the technical problems associated with actually creating the web itself.
These efforts were based largely on the use of the UNIX file structure as a storage medium and the use of
BSD Sockets as the communications basis. UNIX, of course, being the prevalent operating system on the
Internet at that time. More recently, we see a trend toward the implementation of servers on smaller, cheaper
"platforms" including MacIntosh, Windows NT and DOS/Windows combinations.
In virtually all these cases the control and management of the HTML documents and associated multi-media has
been the responsibility of the peron managing the server and the file system which is provided with the underlying
operating system. This can lead to administrative difficulties similar to those experienced in large time-sharing
environments where many users require space to store small files etc..
Other problems derive from the desire to either security requirements, optimisation of internet traffic. Initiatives
seeking to address these problems include the Common Gateway Interface (CGI) initiative and the "proxy"
server. These are largely based on the underlying file system as before but provide some form of "caching"
of frequently retrieved documents and provide a control point to preclude the need to allow open access to
the Internet.
When we begin to consider the integrity of the data within the Web itself, some of these problems become
considerable. Problems related to the
caching and "ageing" of files across users,
integrity management,
off-peak optimisation of services,
security access,
data compression/encryption
are real and require attentions.
It is asserted that the fundamental, file-oriented, structure on which the existing web is based does not readily lend
itself to the solution of these problems.
Control of Services
Whenever the word "control" is used in the context of a public information service, issues of censorship and
"big brother" are frequently raised. There may be some validity in these arguments, but it depends entirely on the
organisation who administers the controls and not the controls themselves.
The Cluster Server requires facilities which enable either, users to have access to selected sites or to exclude
access to selected sites. This type off access control may be enacted to control overseas costs, to
exclude access to sites who request exclusion, or to exclude external (to the cluster) access for the purposes
of Internet access control. At all stages the inclusion/exclusion is based on access rights prescribed within
a cluster.
Resource Accounting
Another group of commonly sought controls includes account CPU, transmission and disk storage allocations as are
frequently found in mainframe environments. These, although largely self-evident in nature, are provided through
an "open systems" exit facility. Through this, the management of a given site can add their own desired "routines"
to implement the controls.
Integrity Management
This is a large area of function within the Cluster Server. The objective is to establish levels of web integrity
( e.g. no undetected missing links) which ae close to that experienced within the Distributed Database Environment.
Suffice to say the the server utilises a databse for document and multi-media storage rather than a file system.
In this way the management of the documents and their associated links - especially to other "trusted" Cluster Servers,
provides a considerable number of options for
synchronising servers during off-peak periods,
automatic ageing of documents according to either HTML rules or date/time of last reference
off-peak link validation
automatic link establishment (c.f. the World-Wide Web Worm) and
storage compression/extraction facilities.
The Cluster Server Project at Murdoch
Objectives
Established in January 1995, the project is been funded by a Murdoch University grant and seeks to
create a prototype Server based on the Distributed Database Environment as proposed above. Results will be published
from time to time, and will be available on the Cluster Server itself in due course.
It is not expected that a full implementation will be produced, rather, that a basis for the future extension
and validation of the model be produced. It is expected that the project will uncover new requirements
and will probably change some of the existing web requirements during its execution.
Conclusions
Many of the principles in the Cluster Server are the result of the application of Distributed Database and
mainframe resource management principles. There is little new technology involved at all. In this respect,
the project takes few risks in its execution. It is believed that the combination of principles, which were
arrived at by taking a fresh look at where the World-Wide Web is going, will produce a more stable
platform on which to develop "maintainable" web information to meet the growing investment of knowledge in
the Internet.
References
Berners-Lee, T., Internet Engineering Task Force, (1993) "Hypertext Transfer Protocol, Draft Internet
Standard"
Berners-Lee, T., Connolly, D., Muldrow, K., Internet Engineering Task Force, (1994) "Hypertext Markup Language (HTML), Version 2.0
Draft Internet Standard"
Berners-Lee, T., Cailliau, R., et al. Communications of the ACM, Vol 37, No 8 (1994) "The World Wide
Web"
van Duuren, J., et al. Addison-Wesley (1993), "Telecommunications Networks and
Services"
Kauffels, F-J., Addison-Wesley (1992), "Network Management, Problems, Standards and
Strategies"
Hypertext References
- HREF 1
-
- http://info.cern.ch/hypertext/WWW/TheProject.html - Home page for the WWW project at CERN.
- HREF 2
- ftp://ftp.merit.edu/statistics/nsfnet - Statistics for 1995 NSF Internet traffic
- HREF 3
- http://www.sdn.dk/fsk/actionplan/ - Danish Government "Info-Society 2000" Statement to Parliament March 1995
- HREF 4
- http://info.cern.ch/hypertext/WWW/Daemon/Overview.html - CERN web server guidelines
Copyright
© Southern Cross
University, 1995. Permission is hereby granted to use this document for
personal use and in courses of instruction at educational institutions provided
that the article is used in full and this copyright statement is reproduced.
Permission is also given to mirror this document on WorldWideWeb servers. Any
other usage is expressly prohibited without the express permission of Southern
Cross University.
Return to the AusWeb95 Table of Contents
AusWeb95 The First Australian WorldWideWeb Conference