This paper examines the involvement (or lack) of professionals trained in cataloguing in providing navigation systems to access the Web. A number of differences exist between conventional cataloguing systems and internet cataloguing systems. Because of these differences it has taken time for both spheres to adjust to each other. Yet such an adjustment is inevitable and the creation of Internet catalogues that are controlled by professional cataloguers has begun.
The paper will range from an examination of what factors have led 'Nerds' to dominate the area of cataloguing the Internet to examples of tools and maintenance methods in which Internet catalogues can be set up to take advantage of contemporary cataloguing skills.
Demand for information is no longer dominated by researchers. The new users of information retrieval are the general public on a world wide scale on the world wide web. This presents the Cataloguer with an interesting new position - caught between infinite numbers of users and infinite amounts of information. Before the Web there were less users of information retrieval systems, these users tending to have the skills required to use Library facilities. Now a seeming infinite number of users are after information of completely different natures. The other infinite is the amount of information available. Not only are subjects holding much larger on the Web, they are also continually changing requiring constant maintenance.
Although 'the answer is out there' on the Internet, people are beginning to back off having found search engines to be less and less useful. Signal to noise ratio has reached the point of being far too large on these search engines.
The only sane approach is to find the best method of acting as a conduit so the characteristics of each seeming infinite void (information availability and number of users) can be utilised and bound together as a continual system of information delievery.
This paper talks about the history of the Internet in terms of trends (and fads) in delivering information from a perspective considering the lack of any structure such as metadata repositories of ordered catalogues.
I put forward reasons why the uptake of proposed cataloguing standards like metadata have been slow and why the utilisation of librarian skills in providing navigation pathways and search engines have been even slower.
There are several models of how information can be successfully and professionally catalogued and delivered. This paper attempts to promote some of the essential steps that have not been closely examined that will eventually lead to a more cohesive Internet in terms of 'channels' of data from specialised subject portals.
Although much computing history is about as as interesting as fishing stories there are a number of steps in the evolution of the World Wide Web that have occurred that give some kind of indication as to where the Internet is going in terms of the finding of information.
Here are some steps as to how we got here. All the periods below have a large overlap with each other and relative dates would be too nebulous to state -
Even before Guttenburg the same craftsmen in Monasteries who first made Books at some point started organising them into meaningful piles. Because they had the most interest in caring for the books and putting them to the most use, it is an obvious connection.
The analogy with the Web is that 'Nerds' (and I mean people like me) made the Internet so it is no surprise that the first method of organising Information on the Internet, the Search Engine, was also implemented by the same people. The reason is simple. They were the first people to have the problem of providing access to the Web pages they had written. The medium dictated the tools.
Of course at the same time there were fledgling catalogues such as the Whole Earth Catalog, Yahoo etc but these at the time were less useful because the Search Engines had an initial information space which was manageable by the Search Indexes they maintained through Spiders.
There weren't many pages to handle so hit lists returned were manageable.
Although Librarians were early adopters of the benefits of the net as users and remain expert in its use, they had little option but to gain programming and computer system skills if they were going to become providers of information retreival systems which meant that most were effectively prevented from participating.
Without appropriate tools of the trade, professional cataloguers could do little more than sit back and take advantage of the net without being able to fully participate in the task at hand of cataloguing.
With the recognition that the web could be organised better with catalogue records attached to Web pages the Dublin Core metadata standard was proposed...... Years ago.
It was and still is, an attempt to create a greater order in Web pages and, although the web has thrived on and owes its existence to happy anarchy, the possibility stills exists to better order the information available.
Here are some reasons why Metadata has not progressed to being a standard -
Hence the job of creating metadata is currently truly a thankless task. You can do it but you will have no idea if it ever gets used or not.
In my initial involvement with Metadata I was somewhat stunned to find that little or no commercial software existed to handle the complexity of the Metadata specification.
In other words the committees drafting the Metadata standards had not assessed the ability of technology to provide the kind of results that they were after.
This was a bad error that in my opinion has left the standard at the starting blocks.
I also suspect that after the initial Dublin Core metadata standard was put forward and received slow uptake, it seemed that the people who made the specification then busied themselves adding to it to keep the committees busy and the conference papers rolling in. The result has been a standard that very few people understand.
The increasing size of the Internet began to strain search indexes. As this happened less and less of the Internet was being indexed and at the same time user searches on common terms returned so many hits that the information began to be useless.
At this time portals such as Yahoo began to be more useful because Yahoo gave a catalogue like structure. This structure could be both browsed and searched but in the case of a Yahoo search the user was getting quality refereed data, in the results list. But this was not the work of Librarians, just enthusiastic computer types.
More and more Web Sites are turning to the concept of cataloguing as a means of providing portal sites. Looksmart (http://looksmart.com.au) are currently doing mainstream advertising in Magazines advertising their cataloguing process as 'taking out the garbage'.
In other words they are telling users that Looksmart is making the unmanageable results lists of traditional search engines into manageable ones only containing data that has been specifically catalogued.
Another example that I have been involved with is the http://fed.gov.au site which has done this now for over a year.
The general concept is simple. Catalogue information in such a way that browsing and searching of the catalogue trees are made possible.
Clearly there is a place for cataloguing in the Web. There are trained professionals who know how to order information so that it will not only perform in the short term but will also be able to cope with the structures remaining intact in the long term.
But the cataloguing of a site is not just establishing an index and sitting back. Every part of the Web is in constant need of maintenance. As Marshall Mcluhan once put it - 'Automation gives primacy to process' and it is the process of publishing Web Sites and cataloguing Web Sites that is the overall key.
To achieve this requires a number of tools that assist the content creator and cataloguer in cataloguing on the Web-
In content creation we need these tools-
In harvesting the metadata and in the role of cataloguing in general-
Central to all the above is the need for simple systems so that non computer programmers can operate them.
I would like to now cover each of the above items giving examples and commenting on my experience with FED.GOV.AU.
Creating Metadata is a problem for the following reasons -
Because of the above problems I decided to write a metadata web browser called 'Metabrowser' (http://metabrowser .spirit.net.au) which would, if nothing else, at least make the hidden data in Web pages visible and display the metadata is such a way to clearly show whether the syntax of a record is correct.
It also allows Metadata to be created and stored. It does this by providing access to all the major thesaurii and controlled lists from a central server.
In addition templates of metadata records can be applied to Web Pages as a starting point in authoring further ensuring that the correct formats are used.
The next problem lies in getting out onto the Net and making the metadata available to the appropriate portals.
The Netscape RSS format (http://mynetscape.com) for channels is an excellent method of doing this. The available specification is used by an ever growing number of sites that recognise and support the XML format. It is possible to register your RSS Channel file with these sites thereby giving them access to the pages from your site that you consider most relevant for the portal in question.
The RSS format was the basis for the 'Harvest Control List' specification.
Registration processes differ. In AGLS, the expectation is that the Harvest Control List will existing at the URL http://yoursite.gov.au/meta. In RSS it is neccessary to initially inform a portal that the site exists. In either way the site owner is able to promote site data in a controlled way.
Metabrowser can be used by Librarians to catalogue sites When a tool such as this is used all requirement to understand how the index is technically delivered is removed. The content of a catalogue is able to be built up in much the same way as bookmarks are made in Netscape or Explorer.
It can take only a few minutes to move through an RSS Channel or Harvest Control List or even a Web Site's index page and catalogue the links presented. Saving the catalogue immediately puts the information into play on the portal. No computing skills are required other than using a Web Browser.
I'm sure that there are many ways in which this is being currently done. Some form of scripting is used to deliver either a tree style browseable index or a search facility to range over the indexed data and return results.
The systems are simple enough that they can be built in virtually any environment with any language.
Though simple this is the means by which the user gets to your data so the need for such software is extremely important.
The requirements are-
This is an important point in terms of the 'process' of publishing an index.
Without knowing what sort of things people are coming to your site looking for, you have practically no hope of being able to supply the information.
Many sites now use the search words that are entered by visitors to analyse what kind of information is being looked for. For example the words and phrases entered can be checked against what is held in the browse tree and highlight what types of information are being sought after
This then lends itself to continual improvement of the Catalogue by responding to the needs of the audience. Other methods can be used as well such as ensuring the most popular searches are responded to with the most appropriate links and that more time can be justifyably spent improving those popular subjects.
In every Catalogue there will always be a gap between the names the public called things and the names used in a Catalogue. People searching for Government information might enter 'dole' but only information on 'newstart' 'youth training allowance' etc is available in the catalogue.
To translate these terms, again using the FED.GOV.AU example, a simple text document is maintained that allows equivalent terms, broader terms and just simple common spelling mistakes (Beasley, Centrelink etc).
The difference between these lists and normal thesaurii and synonym lists is that when an entry is inserted, it is because it will result in specific pages being returned, not a list of further choices. The lists then begin to grow organically and are very much part of the feedback cycle neccessary for a catalogue of information.
In general I believe that such systems are essential to the work of cataloguing internet resources. It comes back to the problem of there being too much information that could potentially be included. As Looksmart say in their advertising - the garbage is taken out for you.
Lastly there is a requirement for checking the links in such an index so that it is as accurate as possible. A service held in high regard is the Australian LinkAlarm http://linkalarm.com.
Because this is a service and not a piece of software you simply sign up, tell them what your web site address is and every month you receive an email listing of what links are broken on your site.
Whether as part of a portal or not, the advantage of tools like these, is that you don't need staff to run the check for you and you don't need maintenance procedures in place to make sure that they run the report. Its another example of an emerging set of tools that anyone can use to manage content.
Slowly, with a kind of inevitability attached to it, the Web is turning its attention to serious cataloguing as a means of sustainable navigation.
The next few years will see a proliferation of subject based portals.
At the same time cataloguing professionals will begin to be absorbed into this process adding value as they go.
What is needed are simple tools that can be fashioned into systems to allow this to occur and continued success of portals that go about their business in the ways described in this paper.
The results of these will be real, maintainable systems of ever increasing sophistication.
The views expressed in this paper are the views of Spirit Consulting through experience with AusInfo and fed.gov.au and do not necessarily reflect the view of AusInfo.
[ Proceedings ]