WWW.AU
This method of growing and maintaining the index is consistent with the overall aim for WWW.AU which is to keep the index as small as possible and of high quality in order to help users find information without transmitting long lists of potential sites containing a large number of false hits.
The URL is the "Uniform Resource Locator" which points at the definitive document for that site. The category gives a first-level classification of the content area of the site (such as "Music" or "Computer_Science"). The node and network are components of the URL which are extracted automatically and used in conjunction with the category for indexing purposes.
The organisation and branch fields are typically used to give the name of the organisation which owns the site and a department or sub-group (e.g. CSIRO, Division of Information Technology). The password is supplied at the time the entry is first created, and is then required in order to modify the entry at any later time. The description can contain any text describing the whole collection of information which defines the site. Again to encourage quality rather than quantity, this description is limited in length to 256 characters.
A decision was made to start with a fixed set of categories based on an analysis of the actual WWW sites in Australia in late 1994. People adding new sites must choose the nearest fit from amongst the existing categories. The author of this paper reviews the new entries periodically, adjusting categories where that is felt to more accurately reflect a site's content.
Occasionally, new categories are created and existing entries regrouped accordingly. This happens either in response to feedback that no category properly matches a particular site, or if a category is growing too large and there is a logical way to split and recombine categories to keep the spread of entries per category more even.
This policy has been borne out in practice since owners of sites do tend to go out of their way to provide feedback, frequently requesting more detailed categories for their own sites. In the event, only four new categories have been needed for the index to date. The new categories created were Internet_Services, Media, Science, and Sport. The latter two were created in response to entries which had no really suitable match to existing categories. Media was created because of the high growth rate in this category during early 1995. The Internet_Services category was created from entries which had previously been placed in the rapidly growing Commercial and Network categories. The actual number of entries per category and how this has changed over time is described elsewhere in this paper.
