Wide Area Network Monitoring using Java and the Web
Kent Fitch, Information Technology Services, CSIRO, Limestone Av, Canberra 2600
Australia. Phone: +61 6 276 6711 Fax: +61 6 276 6617
Email: kent.fitch@its.csiro.au
Home Page: Kent Fitch[HREF 1]
Keywords:
WorldWideWeb, Java, network monitoring, WAN
Introduction
This paper describes the architecture and implementation experiences of a
project to collect Wide Area Network (WAN) traffic statistics and make them
available in a rich GUI environment based around WWW and Java.
The main topics discussed are:
- establishing the need for collecting detailed statistics on WAN traffic
- how raw data is gathered and transported to a central repository
- how data is summarised and presented using the WWW and Java
- the advantages of presenting information using the WWW and Java compared
with alternatives
Background
CSIRO[HREF 2] is a large and geographically
dispersed organisation employing over 7000 staff at research laboratories and
administrative units at over 60 sites across Australia. As a founding member
of the Australian arm of the Internet (AARNet), CSIRO has been using the
Internet as its Wide Area Network (WAN) backbone for several years.
The combination of:
- the introduction of volume-based tariff structures by Telstra Internet,
- our dependence on the reliable and predictable operation of the WAN, and
- the increasing traffic loads on the WAN
have made it clear that we need to be able to understand and account for current
network traffic and predict the need for further capacity.
This paper describes a system which collects information about Wide Area Network
traffic, transports it to a central repository and makes it available for
display through a WWW based Java GUI.
The need to collect detailed network usage statistics
CSIRO has an operational requirement to monitor and understand
WAN usage because:
- The Telstra Internet system is used by CSIRO as its WAN.
- Whilst Australian WAN traffic was covered by a fixed price under
the old AARNet arrangements, Telstra Internet charges are now on a
volume basis [HREF 4].
- The WAN is critical to the business operation of CSIRO.
- Use of the WAN, and hence the associated costs, are growing very quickly.
- The make-up of WAN traffic was largely unknown. We could not answer
basic questions such as:
- how much of our traffic is Web related? how much FTP?
- what load is that new client/server system putting on our WAN
links, and how is it spread throughout the day?
- how much would the network response times of administrative applications
improve if we moved our Network News (NNTP) service to another site? Or
just took news feeds during the night?
- how much of our traffic originates/terminates at/from other
CSIRO sites? other Australian sites? the local Regional Network?
- what are the current significant network flows occurring now
that seem to be affecting performance?
- how does today's traffic compare with the same day last week? What
about this month compared with last month? How quickly is Web traffic
growing?
Previously, the extent of our WAN traffic volume statistics was raw port counts from our Cisco routers.
They did not allow us to analyse traffic by protocol, source or destination, nor did
they reveal individual flows between hosts contributing to congestion.
Collecting detailed network statistics
The natural point to collect WAN network statistics would seem to be the border (or tail)
network routers which interface a single LAN to the WAN or a collection of LANs to each
other and the WAN. Routers have traditionally been designed to minimise routing delays,
and any collection of statistics has been provided merely as by-product.
We looked at the following means of collecting statistics:
- RMON
- The Remote Network Monitoring
(RMON)[HREF 5] component of the Simple
Network Management Protocol (SNMP) [HREF 6] has been designed to allow for the
remote management of network monitoring devices. It is probable that RMON
will eventually be a widely deployed, integrated, interoperable and capable tool for collecting
network statistics of all kinds. However, until RMON evolves to the point where it
is bundled in routers, an alternative approach based on general-purpose packet sniffing
tools (which gather information from network packets as they pass by a network interface card)
appears to be simpler and more pragmatic.
- NeTraMet
- NeTraMet[HREF 3]
is a network statistics accumulation package which implements the
Internet Accounting Architecture [HREF 7].
We considered NeTraMet for our purposes but decided to look further afield for
these reasons:
- Configurability is extensive but complicated.
- The architecture classifies data on a per-packet basis as the packets arrive. Whilst
very flexible, this approach requires considerably more real time CPU resources than
a "simple gather then post-process" approach. The consequence of this is that NeTraMet may
struggle on low-end processors to implement reasonably extensive collection and classification
rules under moderate to high network loads, resulting in data being lost.
- The standard NeTraMet technique for reporting collected statistics is SNMP. Whilst the
preferred approach in an SNMP-centric environment, this is not necessarily the simplest, most
efficient or most reliable way to transport information in general.
- Net-acct
- We then looked at a very small, simple and fast network accounting package, net-acct written by
Ulrich Callmeier [HREF 8] for Linux.
Net-acct seemed focussed at collecting accounting statistics for the generation of
customer accounts by Network Service Providers. Net-acct comprises:
- a function which puts the network interface into promiscuous mode (meaning that it receives and
processes all network packets, not just those addressed to the interface card)
- a function which filters the headers of interesting packets (in our context, packets
which are about to be sent to or have arrived from the WAN)
- a function to accumulate packet
statistics based on flows in real time. A flow roughly corresponds to a TCP session or UDP traffic
class consisting of the 5-tuple:
- protocol type (TCP, UDP, ICMP, other)
- source IP address
- source port
- destination IP address
- destination port
- a function to periodically write accumulated flow statistics to disk.
We modified net-acct to make it even faster by:
- removing the SLIP/PPP and multiple network interface code
- making minor optimisations to the packet filtering code
- adding hashing to the packet statistics accumulation code
- converting the writing of statistics to use a binary rather than human-readable format
As a result, net-acct running under Linux version 1.2.8 on a
486-33 with 8MB of memory and 16 bit ethernet card can cope with average prime-shift
packet rates of 450 packets/second and regularly sustained rates of over 800 packets/second
with an average CPU usage of only 12% and less than 0.05% of packets being dropped.
The output of net-acct is packet counts and bytes, sent and received over the
last 60 seconds for each flow "tuple". A post-processing program called "gather" runs as
a background task to read the file output by net-acct and produce:
- a header containing the timestamp and summary statistics
- a summary on TCP traffic by port
- a summary of UDP traffic by port
- a summary of other traffic by protocol
- the top 100 flows sorted by bytes sent and received
- an accounting summary of packet sources and destinations showing breakdown by:
- traffic to other CSIRO sites connected via a private link
- traffic to non-CSIRO sites connected via a private link
- traffic to other CSIRO sites within the same Regional Data Network
- traffic to non-CSIRO sites within the same Regional Data Network
- traffic to other CSIRO sites in Australia
- traffic to other Australian sites
- other traffic
The main intent behind the production of the accounting summary statistics is to
aid reconciliation of data traffic bills. However, classifying traffic in this
manner is very error prone and relies upon a perfect knowledge of IP addresses and
routes and hence, its use is currently only experimental.
The "gather" program then compresses these summary components and uses TCP/IP to send
the compressed file to a network statistics collection facility described later.
The net-acct system has been tested on the CSIRO Corporate Centre network at Limestone
Avenue in Canberra. The following diagram shows the physical positioning of the net-acct system in the
network:

Figure 1. Where net-acct is positioned in the network.
It is envisaged that systems running net-acct will be installed at other nodes in the
CSIRO WAN during 1996. Each system will send compressed network statistics to a
central collection system.
The Central Network Statistics Collection and Server system
Storing network statistics in one place simplifies processing and retrieval. Rather
than each client wanting to display statistics having to know about all collection
points, the only network topology configuration required is for each collection
point running net-acct to know the name of the central collection system, and for
each client to know it also.

Figure 2. Topology.
The central network statistics collection system is written as a multi-threading
Java application running on a UNIX server. As TCP/IP connections are received
from net-acct systems, a Java thread is spawned to read the file, decompress it,
and extract the collection point node name and timestamp which are used to generate
the filename under which the statistics are stored. Eg, statistics from 15:24 on the
30th April 96 collection from the Limestone net-acct would be stored as:
./limestone/detail/96/4/30/15/24
where the detail denotes detailed statistics. Each file decompresses to between
3K and 4K, generating 4MB per day per collection point. Asynchronous threads produce
hourly, daily, weekly and monthly accumulations, and eventually, old detail files
are deleted.
The serving of statistics to client programs has also been coded in Java, and in fact the
serving system runs in the same address space as the collection system.
These systems would traditionally have been programmed in C. So
what benefits does Java bring?
- It is easier to write correct Java programs. This often touted benefit of
Java really exists! Without fixed length string buffers, pointers, and explicit memory management,
Java programs require less code and less debugging.
- Java supports an excellent sockets implementation which allows the TCP/IP aspect
to be abstracted away. The Java programmer can choose to operate on a data stream which natively
supports integers, longs and bytes, and so the byte-oriented reality of TCP/IP can
be ignored. Of course, the "gather" program that produces the data
has to supply the data with the appropriate "endianess".
- Java supports the multi-threading required to handle simultaneous connections
efficiently and almost transparently to the programmer.
- Collector and server require synchronisation. For example, the collector needs to
lock others out from files which are being written so that a server does not attempt to read an
incomplete file and send it to a client to display. Also, a server may be "waiting"
for the latest statistics to arrive, and it is much more efficient for it to wait and be
notified by a collector of new statistics than to poll for their availability. Mechanisms to
synchronise producers and consumers are natively supplied in Java, and
make what is otherwise typically error-prone coding simple and reliable.
The Network Statistics Client
The network statistics client connects to the central network statistics collection and
server system to retrieve statistics which it displays graphically.
Over two years ago we developed the
TWEETY network response
time system [HREF 9]. It graphically represents end-to-end response times at the
TCP layer over our WAN, and has been a very handy tool, alerting us to outages and
performance degradation. However, the TWEETY client was written in Visual Basic which limits
the viewing platform to Intel hardware running Windows. Our aims for the
network statistics client were:
- to make it available to users on all common hardware and software
platforms
- to allow it to be used by the casual user or operational areas
- to support embedding network statistics information in a larger monitoring system
- to allow network statistics summary reports to be embedded in e-mail.
It was decided to implement the client as a Java applet for these reasons:
- Java applets can be located and run entirely from a Web browser environment. The
user does not need to download, install, configure and master special software, nor manually
keep it up to date as new versions are released. Furthermore, for users with
Java enabled mail clients, interesting statistics or daily summaries can be
mailed and run when the user opens the mail item.
- Java allows the construction of the GUI features such as data drill down
we required.
- Sending the raw data to the application
is much more network efficient that the alternative of converting the data into
images on the server and transporting the images over the network.
- As mentioned above, Java is simply a superior way of writing correct programs,
and the TCP/IP layer abstractions remove a common level of difficulty in network
programming.
- Java is portable. Hence the client can run on Unix, Macintosh and Windows 32 bit
platforms now (with Windows 16 bit in the pipeline).
- The Java Abstract Windowing Toolkit (AWT) contains basic functions which can
be easily built into a class library for displaying simple bar, line and pie graphs.
With this class library available, it is very simple to add graphics to an application
such as network statistics.
The first step in developing the client was to implement a
basic graphing library
[HREF 10] for bargraphs, linegraphs and piecharts. This class library was designed to:
- support multiple graph canvases in a single window
- call-back to the originator of the statistics to facilitate data
drill down
- automatic scaling of axes and graph sizing
- support the real-time addition of new data and removal of old data from the
graphs
Once the graphic library was stable, the next step was to design the
interface between the client and the server. The client sends the following
information to the server to retrieve data:
- collection location
- type of statistics - detail, hourly summary, daily summary, weekly summary,
monthly summary
- when the statistics were collected - latest, exact date-time, after date-time
- how many samples to return - just one, arbitrary number
- replay speed - in the case where history statistics are being displayed, the
time interval between updating the graphs with new data
The native Java TCP/IP socket services were used to establish a connection to the statistics
server, and the Java datastream abstraction was used to hide details of
TCP/IP. Since this work was done, Sun have released two Remote Procedure Call (RPC)
type methods for Java:
- JOE [HREF 11], which
is a Java implementation of the Open Management Group CORBA standard for managing
distributed objects
- Remote Objects for Java [HREF 12],
a Java specific method to allow Java objects to communicate over a network.
By hiding network and communications specifics, both techniques allow programmers to
improve used standard object programming techniques to invoke methods regardless of
where the objects implementing those methods are located on the network.
Interfacing the statistics retrieval code with the graphing classes was simple.
Here is a screen shot of a typical display of detailed statistics:

Figure 3. Typical current status display.
Mouse actions are interpreted as:
- drill down on a bar to show a pie-chart breakdown by TCP or UDP
protocol (left mouse click)
- drill down on a bar to show a the top 50 flows (shift+left mouse click)
- display summary details of a bar or pie segment (bytes, packets)
(right mouse click, or meta+mouse click on a Macintosh)
The "top flows" information is shown in a separate frame as a scrollable grid:

Figure 4. Typical current Top flows display.
Problems and Future Developments
Most Java developers would have these items on their wish-list for the Java enhancements:
- Packaging and compression of class libraries. This application consists of
dozens of classes, which are currently downloaded one at a time as they are
required by the application. Hence, the application is very sluggish when first
invoked due to the delays retrieving the classes.
- Compilation. The speed of the Java interpreted code has not been an issue
in this application. However, Just-In-Time compilers are becoming available,
and with the packaging and compression of class libraries would allow Java to
rival C in execution time speed.
In the application itself, some changes we would like to make are:
- Exploring the possibility of making some use of the accounting data which attempts
to classify information volumes and types by the remote source or destination address.
- Moving the storage of statistics from flat files to a relational database. This
would make searching for statistics somewhat easier (eg, "give me the last 3 monthly summaries"). The forthcoming Java interface to SQL -
JDBC [HREF 13] would be the
preferred connection method.
- Moving the interface between client and server to JOE or RMI
We also need to test the feasibility of maintaining a multitude of statistics
gatherers over the end points of our WAN, and planning how we would support
collection on 100Mbit/second network segments.
Conclusion
A great deal of useful planning and performance information can be gathered
and effectively displayed using a modified packet-sniffer and a Java user interface.
This project was used as a trial of the effectiveness of Java as a development
language for both server and client components, and for testing the effectiveness
of making Java applications available through a Web interface.
On all counts, Java demonstrated itself to be an excellent solution which
extends the functionality of the Web.
A demonstration [HREF 15]
of the current system is available for viewing on the Web.
Hypertext References
- HREF 1
-
- http://www.csiro.au/itsb/staff/fit106.html - Kent Fitch's Home Page
- HREF 2
-
- http://www.csiro.au - the CSIRO home page
- HREF 3
-
- http://www.auckland.ac.nz/net/Accounting/ntm.Release.note - the NeTraMet home page
- HREF 4
-
- http://www.aarnet.edu.au/aarnet/pricelist.html - Tariff Schedule for Telstra Internet Services
- HREF 5
-
- http://ds.internic.net/rfc/rfc1271.txt - Remote Network Monitoring (RMON) RFC
- HREF 6
-
- http://www.outbackinc.com/Dev/SNMP/ - Simple Network Management Protocol (SNMP)
- HREF 7
-
- http://ds.internic.net/rfc/rfc1272.txt - Internet Accounting Architecture RFC
- HREF 8
-
- mailto:uc@brian.lunetix.de - Ulrich Callmei
- HREF 9
-
- ftp://ftp.csiro.au/csiro/sunos/tweety/tweety.doc - TWEETY network response time system
- HREF 10
-
- http://www.csiro.au:8000/kent/netstats/GraphDemo.html- CSIRO ITS Java basic graphing library
- HREF 11
-
- http://www.sun.com/sunsoft/neo/external/neo-joe.html - JOE
- HREF 12
-
- http://splash.javasoft.com/pages/intro.html - Remote Objects for Java
- HREF 13
-
- http://splash.javasoft.com/jdbc/ - JDBC
- HREF 14
-
- http://www.csiro.au:8000/kent/netstats/teststats.html - CSIRO Netstats application
Copyright
Kent Fitch ©, 1996. The author assigns to Southern Cross University
and other educational and non-profit institutions a non-exclusive licence to use
this document for personal use and in courses of instruction provided that the
article is used in full and this copyright statement is reproduced. The
author also grants a non-exclusive licence to Southern Cross University to
publish this document in full on the World Wide Web and on CD-ROM and in
printed form with the conference papers, and for the document to be published
on mirrors on the World Wide Web. Any other usage is prohibited without the
express permission of the author.
AusWeb96 The Second Australian WorldWideWeb Conference
"ausWeb96@scu.edu.au"