Debbie Campbell [HREF1], Director, Coordination Support Branch [HREF2], National Library of Australia [HREF3], Parkes Place, Canberra, ACT 2600. Email: dcampbel@nla.gov.au
Although the National Library of Australia has an ongoing programme of digitisation to increase the accessibility of its collections, raising awareness of the existence of the digital versions of these collections requires a programme of activity in its own right. One tool that may be used is a Web search engine such as Google, or a specialised engine such as OAIster [HREF4]. Search engines can be seeded with reliable resource locators for digital objects based on URLs or Digital Object Identifiers (DOIs). However, the locators must be persistent. The persistence of unique digital identifiers encourages ongoing citability, thereby increasing awareness and use of digital objects. This paper describes how the National Library is attempting to ensure the ongoing use of its “digital collections” of books, journals, maps, pictures, photographs, manuscripts and music.

One of the aims of the National Library’s recent Resource Discovery Service Project was to examine an appropriate relationship between the Library’s online information services and commercial search engines [HREF2]. The Library’s Electronic Information Resources Strategies and Action Plan 2001-2002 recognises that a search engine is usually the first point of access to information on the Web [HREF5].
This is considered to be true for any information seeker, unless a particular entry point is integrated by design into the desktop platform, or the institution or employer underwriting the access mechanism blocks access to general Web searching.
One search engine in particular is known to be commonly used by the academic or research profession internationally, as well as by the general public – Google. This is due to its usually reliable pinpointing of information, in part because of its unique method of ranking results according to link popularity. It also examines the words used to describe each link. And to date, Google has also resisted the commercialisation of its results sets that other search engines have been party to.
While Google is currently the search engine of choice of many, it can only be representative in any strategy to increase the exposure of qualitative information as the circumstances of search engines change.
Researchers rely on Google to search across disciplines and resources in the absence of centralised, visible Australian tools and services. Many Australian institutions, cultural and academic, have created or are in the process of creating aggregations of high value online content. The aggregations are usually stored in databases commonly referred to as ‘The Hidden Web’ because search engines, including Google, are unable to harvest them.
The National Library has overcome this problem to a certain extent by writing standard metadata describing the aggregation (i.e. at the directory level), embedding it in home pages and other selected pages of an online service, and seeding the URL for the service directly into several search engines. This process is considered to be relatively successful as it exposes valuable, consistent Library-authored metadata and can be statistically significant. It is also worth noting that web sites with longevity are favoured in Google, and partly explains the reason for the Library’s success.
While metadata at the directory or service level is important, there would
be a greater benefit to searchers if item level records are available to search
engines. For example, it would enable Google to direct searchers to culturally
significant materials and increase the citations of those materials. This should
then be reflected as higher Google rankings. One measurable target is to have
each item appear in the first page of results displayed.
The National Library considered two methods for increasing awareness and use of its high quality digital collections. Although mutually exclusive technically, the intended outcomes are complementary. Firstly, items can be made available to Google for harvesting by exposing their URLs in an HTML file. Listing collection-level URLs on a Web page should be sufficient if the digital items are only one or two levels (sub-directories) away from the collection level address. As the search engine does not conduct a search query when harvesting to access items in a repository, a hierarchical approach to the construction of a unique URL for each digital object is more effective.
In the National Library’s Pictures Catalogue, a persistent URL has been assigned to each digitised image. For example, the image of Upper Coomera Wharf by artist Edwin Bode, 1859-1926 has a persistent identifier of http://nla.gov.au/nla.pic-an5776589-v and its related metadata record has a persistent identifier of nla.pic-an5776589 [HREF6].
Unfortunately, this form of identifier is not recognised by Google. The Google harvester only recognises URLs containing a standard file type, in this case .GIF; .JPG; and .PNG. In particular, it is not supported by the Google image search service [HREF7].
![Bode, Edwin, 1859-1926. Upper Coomera wharf [picture]](image002.jpg)
Despite the best intentions of creating a persistent locator which can be migrated to an international digital identifier standard in the future, an alternate method is necessary for search engines to find these cultural gems.
The second method for seeding a search engine is based on the use of the technologies provided by the Open Archives Initiative. OAI provides a range of open source software to support harvestable collections of metadata. Metadata describing digital items is usually stored in repositories separate from the objects themselves. Both the OAIster project of the University of Michigan Library, and the DP9 project of Old Dominion University’s Digital Library Research Group, have used this approach [HREF8].
The DP9 project exposes harvested metadata records to regular search engines in a gateway for Web crawlers. “DP9 does this by providing consistent URLs for repository records, and converting them to OAI queries against the appropriate repository when the URL is requested. This allows search engines to index the ‘deep Web’ contained within OAI compliant repositories.” Any agency willing to have its collections harvested provides an OAI-compatible server to it.
The National Library has been able to set up an OAI-compliant repository in conjunction with its Digital Collections Manager in test mode. This will allow search engines to be appropriately seeded in the future.
The National Library of Australia has worked with other cultural agencies to establish ‘best practice’ for the permanent citability of digitised and digital materials [HREF9]. This will help to release important cultural heritage collections for future generations of use.
For the implementation:
Tony Boston; Director, Digital Services; National Library of Australia; tboston@nla.gov.au
For the concept:
Kent Fitch; kfitch@nla.gov.au
Debbie Campbell, © 2003. The author assigns to Southern Cross University
and other educational and non-profit institutions a non-exclusive licence to
use this document for personal use and in courses of instruction provided that
the article is used in full and this copyright statement is reproduced. The
authors also grant a non-exclusive licence to Southern Cross University to publish
this document in full on the World Wide Web and on CD-ROM and in printed form
with the conference papers and for the document to be published on mirrors on
the World Wide Web.