Audio On Demand Over The World Wide Web


Templar Hankinson and Guojun Lu, Gippsland School of Computing and Information Technology, Monash University, Churchill, Vic 3842. Phone:+61 3 99026857 Fax:+61 3 99026842 Email: templar@mugca.cc.monash.edu.au guojunl@fcit.monash.edu.au


Keywords

Continuous Media, Audio, Java, real-time audio, audio on demand



Abstract

Current World Wide Web (WWW) treats continuous media such as audio and video, as static files, preventing the use of continuous media in many applications. This paper describes the development of a generic audio tool which allows audio on demand over the WWW. This audio tool is based on the Java applet mechanism. This paper also reports the design and implementation of the audio applet and a sample application. Our experiments show that audio on demand can be supported over Internet segments with sufficient bandwidth.



Introduction

Currently, there are two general methods used to retrieve and play audio over the WWW. The first system uses the store-and-forward method and the second system uses a browser plug-in. In the first method, the browser has to retrieve and store the entire audio file before it can be played. The disadvantages of this method are that the user has to:

Browser Plug-ins are software applications that extend the capabilities of WWW browsers in a specific way. There are "plug-ins" available which allow continuous media to be displayed while the information is being received, avoiding the delay and storage space restrictions of the first method. Though effective in extending a browser to handle continuous media, plug-ins have the following disadvantages:

Ideally, the WWW should support audio on demand where audio is played continuously while it is being received. Also, it would be useful to simultaneously play audio with other media. One example is to have a web page with synchronised audio commentary. To meet the continuity requirement of audio, the entire system from the server to network communication through the client should guarantee the quality of service (QoS) requirements of the audio [1,2]. At the moment, this is not possible due to the lack of QoS support from the server, network and client.

In this paper, we describe an audio on demand tool which takes a best-effort approach (without QoS support) using Java's applet mechanism. Basic issues addressed are:

In the next section, we describe the suitability of Java for supporting audio streams and the overall design of an audio on demand system based on Java applets. The section, Server Design discusses the audio server design. Client Applet Design details the design and the Java classes of the audio applet. A Sample Application describes an application built using the audio applet and the section, Experimental Results, presents some test results. The paper is concluded in the section Discussion and Conclusion.


Java and Audio on Demand

Java has many desirable features for audio on demand applications [3,4]. Some of these major features are:

Our audio on demand system is designed using the above Java features. The overall design of the system is as follows. An audio applet is included in a web page. When a web page is requested the applet is downloaded from the server and executed on the browser. The applet sets up a connection to the server and requests the required audio file. The server then sends the audio file data to the applet. The applet decodes and plays the audio while data is being received. Meanwhile the browser can display a normal HTML page. The audio can be a commentary of the web page.

In the above scenario, the main concern is making sure the audio is played smoothly. In our design, this is achieved by the following measures:

In the following sections the audio server and applet design are described.


Server Design

Technically, a special audio server is not needed. The Java URL class could retrieve the contents of an audio file located anywhere on the World Wide Web (provided the file has the appropriate read status). Using the URL class however, would make it very difficult to implement user interactions such as 'fast forward', 'rewind', 'pause' or 'skip track' as a URL class has no server to manipulate (and optimise) the file transmission.

The main function of the audio server is to accept and serve requests from the audio applet. The audio server consists of two Java classes, namely, AudioServer and ServerThread. AudioServer makes use of the ServerSocket and Socket classes available in Java (Fig.1). These sockets use the TCP/IP network protocol to transfer data. TCP/IP is not suited to real-time data delivery as it guarantees the arrival of all packets sent but does not provide any delay guarantee. As most multimedia information can tolerate some data loss, the UDP/IP transfer protocol would be considered more suitable. If QoS guarantees are a must, resource reservation protocols such as RSVP should be used.

To satisfy all client requests, the AudioServer must always be running on the server, as AudioServer monitors the communication port. When a client connects to the port, AudioServer generates a thread to handle that client. The use of threads is important as it allows the server to handle multiple clients. ServerThread opens the appropriate audio file on the server and passes data through the socket connection. The audio file is in one of the two formats mentioned above.


Fig. 1 Audio Server

Client Applet Design

The client side of the application is more complicated than its server counterpart. The client applet is divided into the user interface, network connection, data buffer, decompression functions and audio player. Of these classes only the audio player is provided by the Java Standard Library.


Fig.2 illustrates how the information is passed through client applet. The main steps are summarised below.

Fig.2 Client Applet

In the following sections, the main classes of the audio applet are detailed.

User Interface

The user interface is created by the UserWindow class. This class gives the user the option to retrieve the previous or next slide and reload or stop the current audio file. The user interface makes use of the label and button classes offered by the AWT package. The user interface also includes the DisplayBuffer class. DisplayBuffer is a simple widget which shows the user the status of the smoothing buffer.


Fig. 3 The User Interface


The ClientAudio Class

After generating the user interface, the ClientAudio thread is started. ClientAudio establishes the network connection to the server and controls all the threads needed to handle continuous data. To connect to the server the Java socket class is used. When the connection is successfully made, the ClientAudio class sends the file name of the required audio file. The server then responds by passing the contents of the audio file or a 'file not found' message. If the file is compressed with ADPCM the returned audio data is passed to the decompression function AudioDecoder. If the audio file is in the au file format it is sent directly to the AudioPlayer class. The uncompressed audio data is sent to the audio data buffer via the WriteToBuffer class.


Buffer Management

WriteToBuffer is coupled with the ReadFromBuffer class. The buffer is solely maintained by these two classes which are based on the Pipeline Input and Output streams found in the Java Standard Library. The size of the buffer is important as both the initial delay and delay jitter (which causes gaps in the audio signal) need to be minimised. As the audio on demand model is a retrieval application rather than a conversational application (a two way communication between users), a greater initial delay can be tolerated. Theoretically, a buffer is needed to remove delay jitter from both mu-law and ADPCM data streams. The current implementation does not use a buffer with the mu-law stream, as the applet achieved better results (in our local environment) when the socket was connected directly to the audio player.

To minimise delay jitter the buffer is allowed to load 56,000 bytes (7 seconds of audio) before the AudioPlayer class is started. Buffer starvation and overflow must also be handled. Buffer starvation occurs when the audio player tries to read from an empty buffer and overflow occurs when audio data is added unsuccessfully to a full buffer. Java handles both of these situations with bound checking. Reading from an empty buffer causes Java to block the instruction until the data is ready. Writing to a full buffer also causes Java to block the instruction until space exists.


Audio Player

The AudioPlayer class takes data from the buffer via the ReadFromBuffer class. AudioPlayer is standard in the Java library, however its use in this application is not. The AudioPlayer class usually outputs an AudioClip object (an object is an instance of class) which is located on the local machine. It was not until the source code was obtained for the Java standard library (from Sun Microsystems) that it was found possible to attach any input stream directly to the AudioPlayer class without the need for an AudioClip object. The current audio player does have limitations as it will only accept data in the au file format (header information followed by mono 8-bit mu-Law audio data). This means that ADPCM compressed data must be converted into the au file format before it is passed to the AudioPlayer class.


Decompression Class

Though not as high in fidelity as 16-bit PCM both 8-bit A-Law and mu-Law compressed audio are very popular on the Internet in the form of au files. mu-Law compression is covered by the ITU's G.711 standard for encoding telephone speech. Quantisation of samples in A-Law and mu-Law differs from 16-bit PCM through the use of non-linear quantisation. Unlike linear quantisation which uses equally sized quantisation steps, A-Law and mu-Law compression uses logarithmic functions to give more quantising levels to the amplitudes that the human ear is most sensitive to. Using non-linear quantisation allows a perceived 12-bit quality (i.e. that human ear is unable to distinguish the difference) to be encoded with only 8-bits.


An audio file using A-Law or mu-Law compression requires a bandwidth of 64 kbits/s (kilobits per second) to transmit and output in real time. This bandwidth requirement could be satisfied by most local area networks but over the wider Internet the bandwidth requirement cannot always be satisfied. Therefore a more sophisticated compression technique has been included in the audio on demand applet. The audio on demand applet decompression class (AudioDecoder class) implements the ITU G.723 (24 kbits/s) standard for Adaptive Differential Pulse Code Modulation. ADPCM was chosen because of the compression rate achieved. A 24 kbits/s bandwidth requirement reduces the required bandwidth for au files ( or mu-Law ) by 62.5%. ADPCM produces reasonable quality speech signals at this bit rate. It is also adequate for non speech signals given that most World Wide Web users often use small speakers next to their terminal for audio output.

As this class must decode large amounts of data, a large computational overhead is created. This overhead is further compounded by the execution speed of the Java Interpreter. Just-In-Time compilers improve the execution speed of Java by replacing Java bytecode with machine instructions, however, due to bounds checking and other Java interpreter functions, the execution speed will still be slower than C++ or assembler code.


A Sample Application

To demonstrate how the audio on demand applet can extend a World Wide Web browser an application was constructed. The application supports static media with continuous audio, and is like a slide show. Each slide is a HTML web page with a corresponding audio file. The content of the application is a guitar tutorial.


A guitar transcription is a notation which allows a guitarist to learn how to play a song without having to read a music score. The World Wide Web has become a huge information resource for guitarists and other musicians. Internet sites such as OLGA (On Line Guitar Archive) offer guitar lessons and guitar transcriptions. The only disadvantage faced when using guitar transcriptions is that without an audio copy or a good knowledge of the song it is hard to know how the song should 'sound'. If a guitarist could hear the guitar score while the guitar transcriptions were displayed on the screen, on-line guitar lessons (and other forms of distance education) could be greatly improved.


In the sample application, each page has a sound file which gives an brief introduction and demonstrates what the guitar score or band arrangement should sound like. The Guitar Tutor Web Page has nine HTML pages each with its own audio file. The first slide gives instructions on how to use the application, the following eight pages demonstrate how guitar transcriptions are aided by continuous audio data. (The application will be demonstrated during the conference.)


Experimental Results

Experiments were conducted to see how well the application performed. The most interesting experiment was to determine the audio quality on different platforms. The experimental environment was as follows:


Performance With mu-Law Compression

Without ADPCM compression the required bandwidth to output continuous mu-Law (ITU G.711 standard) file is 64 kbits/s. Though this bandwidth requirement is too high to be considered over the wider Internet, a 64 kbits/s bandwidth can be sustained over the local network. Under these conditions the Pentium 75 played the audio with occasional clicks. The Pentium 166 machine however, did not share this problem as the audio played with no interruptions. In fact, the quality of audio on the Pentium 166 showed no degradation from being passed through the network. This indicates that the Pentium 75 is too slow for the application as the bottleneck is not with the network. Testing on the SGI Indy workstation produced similar results to the Pentium 166.


Performance With ADPCM Compression

Using ADPCM compression reduces the required bandwidth to 24 kbits/s. This bandwidth requirement is low enough to be considered over the wider Internet. To accommodate the extra delay which may be caused by the ADPCM decoder the audio buffer was extended to 80 kilobytes (ten seconds on audio). The ADPCM applet was initially tested on the Pentium 75 running Windows 95, it was soon obvious that data starvation was occurring. The audio played continuously for approximately 12 to 15 seconds before the audio signal was distorted by audible clicks.


The first series of tests (with mu-Law compression) had shown the network bandwidth could support at least 64 kbits/s then it was obvious that it was not a bandwidth restriction rather it was the applet execution speed or more specifically the audio decoder class which was not executing data fast enough. Testing then moved to the Pentium 166. The increased CPU power enabled the Java applet to successfully output the audio of the first HTML slide (the introductory slide of the Guitar Tutor Web Page). The quality of the compressed audio was similar (if not identical) to the mu-law compressed PCM. The Java applet was not as successful when presenting the second slide, as data starvation occurred approximately thirty seconds into the presentation.


As the Pentium 166 did not have the computational power to successfully decode the audio file in real time, testing continued on the SGI Indy workstation. The SGI Indy workstation did not perform any better than either the Pentium75 or Pentium 166. Though the SGI is more powerful machine than the Pentium 75, the Java interpreters are not of the same standard (the Windows 95 Java interpreter is of higher quality).


Discussion and Conclusion

Is a Java applet a viable solution for presenting audio on demand over the World Wide Web? Real Audio is a browser plug-in which uses UDP/IP, a data loss correction algorithm and Progressive Network's own compression codec to deliver stereo audio over a 28.8K modem. Our sample application can run successfully over the campus Internet using the 64 kbits/s file format. Though ADPCM requires only 24 kbits/s bandwidth, it requires more computation time for decoding at the client, causing audible gaps. This is expected considering that Java is about 20 to 30 times slower than C++. Obviously, a browser plug-in would benefit from faster execution speed.

The following developments would improve the quality of the application:


In conclusion, audio on demand is possible over the WWW. Our audio applet is generic and platform independent and can be used for developing various audio on demand applications.



References

Lu, Guojun (1996) "Communication and Computing for Distributed Multimedia Systems", Artech House Inc, Ma, USA.
Lu, Guojun (1996) "Issues in Supporting Real-Time Retrieval, Transmission and Presentation of Multiple Media Streams in the WWW", in AusWeb 96.
Campione, Mary, Walrath, Kathy (1996) "The Java Tutorial: Object-orientated Programming for the Internet", http://ftp.javasoft.com/docs/tutorial.html.zip
Ritchey, Tim (1996) "Java!", New Riders Publishing.
Naylor, N. E. and Kleinrock, L. (1992) " Stream traffic communication in packet-switched networks: destination buffering consideration", IEEE Transactions on Communications, vol. Com-30, no.12, pp.2,527-2,534.
Zhang, L. et al (1994) "Resource reservation protocol (RSVP) - functional specification", Internet draft, March 1994.
Mitzel, D. J. et al (1994) " An architectural comparison of ST-II and RSVP", Proceedings of Infocom'94, Toronto, Canada, June 1994.


Copyright


Templar Hankinson, Guojun Lu ©, 1997. The authors assign to Southern Cross University and other educational and non-profit institutions a non-exclusive licence to use this document for personal use and in courses of instruction provided that the article is used in full and this copyright statement is reproduced. The authors also grant non-exclusive licence to Southern Cross University to publish this document in full on the World Wide Web and on CD-ROM, and for the document to be published on mirrors on the World Wide Web. Any other usage is prohibited without the express permission of the authors.


[All Papers and Posters]


AusWeb97 Third Australian World Wide Web Conference, Southern Cross University, PO Box 157, Lismore NSW 2480, Australia Email: AusWeb97@scu.edu.au