To address the issues of bandwidth utilisation and cost of the traffic, the Department of Information Technology Services at the University of Melbourne has established a central WWW server to act as a proxy server for all WWW clients on the campus. The proxy server acts as a cache, keeping copies of fetched documents on disk, and supplying the local copy when appropriate, rather than fetching the original from the overseas site. Use of this system has resulted in a significant reduction in network traffic, and faster response times for University of Melbourne WWW users.
Initially, the cache size was set to 1.0 GB. HTTP, gopher and ftp documents were kept in the cache for 2 months or until space was needed for newer documents. The time to live for ftp documents in the cache was changed to 7 days on 10th February, 1995, to test the hypothesis that ftp documents used more disk space than was warranted by the traffic savings.
By comparison, the week between 5/3/95-11/3/95 showed 83,446 requests for external documents totalling 1,013 MB. 36,301 requests (43.5%) were met by the cache; 3,991 requests (4.8%) were met by clients' caches, and 43,154 requests were satisfied by retrieving the original document.
The cache hit rate is dependent on the size of the cache and on the number of users using the proxy server, as Table 1 shows.
-------------------------------------------------------------------- | Date Number Cache Hits Cache Hits Cache Size | | of users (% requests) (% bytes) (MB) | | 4th Dec, 1994 217 23 17 320 | | 21st Jan, 1995 290 31 22 1,022 | | 11th Mar, 1995 683 44 32 963 | --------------------------------------------------------------------Table 1: Growth in cache usage and effectiveness.
The data in the table is affected also by the decision on 10, February, 1995 to concentrate on http traffic caching, rather than treating all protocols equally. The time to live of an ftp document in cache was reduced to 7 days, while the http and gopher document time to live was maintained at 2 months. At the time, the cache contained 204 MB of ftp documents, 67 MB of gopher documents and 790 MB of http documents.
------------------------------------------------------------------------ | Method C a c h e H i t s Cache Megabytes Megabytes | | (%requests) (%bytes) holdings (MB) Requested from cache | | http 37.7 27.3 790 512.0 138.5 | | gopher 10.7 6.2 67 20.6 1.3 | | ftp 7.1 5.5 204 74.5 1.0 | | All 34.1 24.3 1,061 607.1 143.8 | ------------------------------------------------------------------------Table 2: Cache statistics for the period 5/2/95-9/2/95, and cache holdings at 02:42 on 10/2/95
The rationale for reducing the time to live of ftp documents in the cache was based on calculation of the dollar cost vs savings of caching the documents.
If one assumes costs of $0.55 / MB for fetched documents, and $0.02 per MB per week for document storage, one can calculate the net benefit of caching in purely economic terms. In the week 5/2-9/2/95, http cache savings amounted to $76.18 for a cost of $15.80, while ftp cache savings were $2.20 for a cost of $4.08. Clearly ftp documents cost more to store than the savings generated warrant. Converting the 204 MB devoted to ftp documents to http cache would be predicted to realise an additional $18 in http savings for a loss of $2.20 in ftp savings. It should be noted that these calculations do not include the intangible benefits of faster transfer rates, but since ftp fetches are only from cache in 7% of cases, reducing this figure will not impinge much on net performance from a user point of view.
------------------------------------------------------------------------ | Method C a c h e H i t s Cache Megabytes Megabytes | | (%requests) (%bytes) holdings (MB) Requested from cache | | http 49.1 36.8 890 826.1 304.0 | | gopher 5.7 2.6 73 59.2 1.6 | | ftp 8.0 1.5 7 111.6 1.7 | | All 44.2 31.9 970 996.9 307.3 | ------------------------------------------------------------------------Table 3. Statistics for the week 5/3/95 - 11/3/95.
Table 3 shows figures from four weeks after those in Table 2. HTTP requests were satisfied by the cache in 50% of cases, and the byte hit rate rose to 36% from 27.3%. The overall cache hit rate in bytes was at 31.9%, up from 24.3%, a significant improvement. As an aside, the number of users using the proxy rose from 353 to 629 in the same period, still representing only 5-10% of the potential users of WWW in the University.
Figure 2: The Flow of requests from a client, through two proxy servers, to the document source.
When the central proxy has multiple local proxies underneath it, the central proxy must handle the total number of client requests, while each local proxy handles only its own clients.
The solutions to handling this server load problem are:
Figure 3: Distribution of proxy requests during a 1 week period at the University of Melbourne. The top level domain indicated is that of the requested document.
The domains .com, .edu and .au are the biggest sources of documents, and modifying the proxy code to use a different proxy for the appropriate domain would permit the load to be spread over four or more top level machines. Such a topology, illustrated in Figure 4, would enable neighbouring institutions to share a set of proxy servers, taking advantage of the combined caches, without swamping a single parent proxy server with every request issued by users in the group of institutions. An alternative proposal is a system where neighbour proxy servers can query each other for documents. Such a system has been developed - see Harvest Cache below [HREF 6].
Figure 4: A scheme for spreading the proxy load across servers dedicated to separate domains.
The modifications to the proxy code to perform the appropriate discrimination based on domain of requested document have not yet been made, but they are not anticipated to be problematic.
Developed at the Technische Universiteit Eindhoven. The authors themselves state that Lagoon is simply an alternative to the CERN daemon.
Duane Wessels <wessels@colorado.edu> of the University of Colorado has developed web-proxy, a proxy program which uses multi-threading techniques to handle multiple simultaneous connections with a single daemon process, rather than forking a new copy of the proxy process for each client connection. The reduction in overhead on the proxy computer is approximately 80%. Thus, the projected maximum sustainable load for the University of Melbourne would rise from 5 connections per second to nearly 30 connections per second, based on figures quoted in Dr Wessels' PhD thesis [HREF 8] . Dr Wessels' also promotes the idea of long-term and short-term caches, based on a study of time between first retrieval and cache retrieval of http documents, and he developed a mechanism for communication between document servers and caching proxy servers.
The University of Colorado and the University of Southern California have collaborated to develop the Harvest Cache [HREF 6] , a proxy-cache application which allows parent and neighbour proxy servers to query each other for documents. A document is fetched from the closest neighbour which holds the document, calculated on network round-trip times for a 'ping'. This algorithm allows for distributed top-level proxy servers, without resorting to configuring second-level proxy servers to select a top-level proxy by the domain of the requested document's location. The same team has developed an http accelerator [HREF 6] program which can reduce the load on a server by providing a multi-threaded front-end to any WWW server, which can reduce the load on servers by a factor of 200.
AusWeb95 The First Australian WorldWideWeb Conference