Engineering Utility: A visionary role for encoded archival authority information in managing virtual and physical resources


 

Gavan McCarthy [HREF1] , Australian Science and Technology Heritage Centre, University of Melbourne, 203 Bouverie Street Carlton, Victoria 3053, Australia. gavan@asap.unimelb.edu.au


Abstract

The primary focus of this paper relates to initiatives in the heritage industry to capture, structure and use information about 'context entities', formerly known as archival authority records, that is the actors and agents that have played roles, both significant and ordinary, in the evolution of our society. Work in Australia on the HTML encoding of context entities for the World Wide Web has been underway since 1994. Research is now being conducted into the SGML/XML encoding of context entities to radically enhance the utility of these objects in building bridges between the disparate sectors of the heritage industry, not just in Australia but world wide.


Introduction

Important initiatives are presently being pursued in the heritage industry to capture, structure and use information about the actors and agents that have played roles, both significant and ordinary, in the evolution of our society - people, families, and corporate bodies, the activities they undertook and the things they created form the cultural heritage of our society. This heritage is preserved in many places, some formal and publicly funded, some institutional and privately funded and much is cared for by individuals and families in their own homes.

Most activity in the documentation and management of these cultural heritage resources has been traditionally based around describing the things themselves. As a result there are now clear distinctions in practice between the library, archive, museum, built heritage and art gallery sectors of the heritage industry. Entrenchment of professional practice in each field has mitigated attempts to create information bridges between the sectors. However, broadening the scope of the descriptive practice within each industry sector to include the systematic documentation of context, that is people, family and corporate bodies (in the first instance), as distinct entities that are linked by defined relationships to the objects, records and other traces of their activity, will enable these bridges to be built.

The World Wide Web provides the means by which these new bridges can be constructed. On the Web in Australia work on the HTML encoding of context entities has been underway since 1994 and research is now being conducted on the SGML/XML encoding of context entities as part of a broader international research collaborative project. The development of new conceptual frameworks, and new professional practice in entrenched professions are causes not to be taken lightly but are necessary if our endeavour is to use the Web effectively and not allow it to become a dumping ground for digital dross and extinct information systems.

 

The Heritage Industry - Archives

The archives industry in Australia, through Peter Scott and Australian Archives [1] in the 1960s and 1970s, developed what has become known as the series system for documenting and managing archival records.[2] The principle element of this development was that it separated the documentation of creator (provenance) from the documentation of records and recordkeeping systems. It provided a new means by which records could be effectively managed through time while still preserving the context of their creation, and their relationships to other records, both contemporaneous and previous. It also provided the means to relate them to records yet to be created. The issue that drove this initiative was the systematic and structural problems faced by Australian Archives in managing records that were produced by a government and its bureaucracy in times of frequent administrative change. Many of these changes were for political purposes and seldom represented advances in administrative efficiency. The effect of administrative change on the integrity of Web-based government information resources has a direct parallel with the experience of the Archives 30 years earlier and the solution to the current problem is most likely to be based on the same conceptual framework. The separation of the documentation of context from records and resources placed the Australian archival community at odds with most of the rest of the archival world which persisted with the incorporation of the documentation of context in the description of the records.

Archival research in Australia in the 1990s has refined this understanding by defining the key role of the archival process through the whole records / information continuum that links the past to the present and builds bridges into the future.[3] The archival profession world wide has been grappling with the documentation of context to preserve meaning, authority and integrity of records and information sources for many decades from a variety of perspectives. This has led to a vigorous discourse that has challenged leading thinkers to reflect more deeply on the fundamental purposes of record keeping. They are challenged to identify and distinguish these fundamental purposes from the constraints of entrenched practice.

The development of the series system in Australia led to a position where many Australian archivists take it for granted that you separate documentation and management of context from the description of the records. However, this has not been the case in related information professions and indeed in most of the archival world where systems of documentation evolved that embedded context within records description. This, quite naturally, led to systematic problems that have been the foundation of international professional debate over recent decades particularly in relation to the codification of international standards for descriptive practice. Interestingly, in late 1998 the USA National Archives and Records Administration took the decision to adopt the Australian series system to document and manage USA government records. [4] This was an extraordinary, unexpected but perhaps an inevitable development.

Despite the exciting conceptual foundation provided by the series system and its potential for use on the Web to revolutionise the documentation and management of records and related information objects, much of the research effort has been focused on the description and management of records per se. In particular there has been an emphasis on the definition of metadata elements to enable the encapsulation of information objects and their preservation as records in networked environments. However, a notable counter to this trend is found in the work of Sue McKemmish and her colleagues at Monash University. [5]

"... we are looking" McKemmish contented "at the full range of content, structure, context and recordkeeping process metadata and the scheme we are developing includes provision for the capture of metadata about provenance (structural and functional) entities separately from the capture of metadata about records entities, and also for the capture of metadata re the complex relationships between and amongst them. It also builds in the different levels/layers. It could well be implemented via XML/RDF strategies." [6]

In the USA, Daniel Pitti headed an important initiative that lead to the formulation of a Document Type Definition (DTD) in SGML for Encoded Archival Description (EAD). [7] This development and its implementation at a number of important archival institutions in the USA has already established it as a watershed in the history of the archival profession. It has also become a significant topic for debate and discussion within the profession as it reveals, yet again, the enormous variation that exists in the practice of archival description of records at local, national and international levels. It is an important and necessary development, but it is primarily focused on the description of defined sets of archival records and not on the documentation, management and utilisation of information that establishes the context of the records. It is a necessary advance but it is certainly not a sufficient answer.

Chris Hurley, who has developed a reputation for challenging conventional thinking in the archival profession, described the task ahead of archivists in the following terms:

"Archivists can participate in recordkeeping processes by documenting complex relationships between records and context. Records must be placed in context - in time and place - by fashioning descriptive entities and documenting relationships. This is how we can understand the record and derive evidence, it must be interpreted not by reference to our observation of it in the circumstances obtaining when we access it, but by understanding the circumstances which existed at its creation and changes since. . . The two fundamental issues for discussion concerning archival description are therefore what the descriptive entities should be and what are the relationships we need to show between them." [8]

Two standards have been developed and promulgated by the International Council on Archives in an endeavour to improve archival practice world wide and lead towards greater interoperability between archives in different nations and cultures. ISAD(G) focuses on the elements required to document records while ISAAR(CPF) [9] defines the basic elements required to document corporate bodies, persons and families in a global environment.

 

The Heritage Industry - Museums, Galleries and Built Heritage

In other sectors of the heritage industry, online and offline documentation and information developments have focused heavily on the objects themselves, the use of objects in educational exhibitions, and the compilation of directories to custodial institutions that focus on their holdings of objects. [10] The documentation of context, in particular provenance in the archival sense, as a separate but related entity has not figured highly in museology in recent times. However, at a recent UNESCO Forum on University and Heritage [11] it was noted on a number of occasions that museum documentation regimes were not capturing the human stories behind the objects they were collecting. The "significance" of the objects was not being documented. This relates to the museological trend to collect objects for their generic and contemporary educational value and not keep them as records or evidence of specific activity which could be used for a wide variety of research and exhibition purposes.

It was noted that insufficient context, meaning and knowledge were being collected or preserved to enable the objects to carry meaning beyond the context of their creation and use. Too much knowledge of the objects was accepted as implicit knowledge of the originating context and regarded as not required for explicit documentation. Part of the problem relates to the fact that object-based documentation, registration and management regimes do not facilitate this function. The authority record concept, now described as encoded context, was proposed as a means by which this issue may be addressed. [12]

 

The Heritage Industry and the Web

The uptake of web technology by the heritage community in Australia has been highly variable and this reflects the broad range of organisations and individuals that comprise the industry. Private funding and government support for heritage activities is fickle and much of the industry comprises well-meaning volunteers and enthusiasts keen to participate in the preservation of community heritage at the local level. The use of computer and Web technology at the lower levels has been fragmentary and problematic due to the high costs of keeping abreast of the rapidly developing technology. This applies to the hardware, software and the individual expertise required to manage the technology. One consequence of this juxtaposition of influences has been promulgation of simplified software systems for the industry that entrench existing practice rather then enable the exploration of the power of the Web and its potential to enable the industry to achieve the goals it has set for itself in the preservation and communication of cultural heritage.

Within the geographically diverse community interested in the history and heritage of Australian science, technology and medicine there is a wide diversity of participants including university-based academics, practicing scientists, industry bodies, and retired scientists and their families. The wide uptake of Web technology at the personal level within this community provides us with the opportunity to utilise this diverse, enthusiastic, knowledgeable and often highly skilled group of people to participate in the process. The context-based web space created by the Australian Science Archives Project described later is one example of how this may be achieved.

In May1996, the modestly funded archives profession in Australia published and has maintained the very successful Directory of Archives in Australia. [13] At the time this was believed to be the first Web-based directory of archival institutions in the world. The much more ambitious and government funded Australian Museums Online was established shortly after and they have evolved a web space of considerable depth and complexity. However, despite establishing an online directory of museums in Australia they have maintained an object-centric view of the world which relies almost entirely on problematic classification systems to unify access across collections.

Work on the use of shared museum object classification systems on the Web using XML has been examined recently by Alain Michard and Giang Pham-Dac. [14] Their study revealed that the mechanisms necessary to implement such a system parallel that required to implement a network of context-based web spaces. Of particular interest is the identified need to create defined relationship information entities that enable the cross-referencing or linking of context entities to other information objects or physical objects and indeed to other context entities.

 

Bright Sparcs - An Australian Experiment in Encoding Context

A notable local exercise in the implementation of a context-based information system was the development by the Australian Science Archives Project [15] of the Register of the Archives of Science in Australia from 1987 and its transformation in 1994-5 into Bright Sparcs, a Web resource of international stature. [16] The structure of Bright Sparcs pre-empted the ICA Standard [ISAAR(CPF)] and, following almost five years of Web implementation, is in a unique position to act as a major development site to rigorously explore the bringing together of "semantics, syntax and structure" as a fully functional reference implementation. Within the history and heritage of Australian science, technology and medicine, the Bright Sparcs (individuals) and the proposed Australian Science at Work (societies, organisations, corporations) web spaces endeavour to document and cross-reference (through the use of relationship entities) the following information objects:

As currently presented Bright Sparcs is built around HTML encoded context entities. These exist as static pages in the web space. They provide the structural framework which is used to anchor links and cross-references both from inside and outside the Bright Sparcs web space. All the data presented on the space is controlled by a variety of databases. All data for the context and relationship entities is modified in the databases and the new web pages are generated automatically through the use of templates. Access to information within the space is presently provided with varying levels of success using Excite [17] and Harvest. [18]

During 1999, Harvest will be replaced by a system that will allow access to the key structured information in the supporting databases and enable more accurate and focused subsets to be constructed. A very recent example of this type of approach can be seen at the Museum Victoria's Bioinformatics web space created by Dr Ken Walker, Senior Curator of the Environment Program. [19] Commonly requested subsets, for example, women scientists and this years centenary of births and deaths, will be accessible via standard interface pages that draw their results from the latest data in the databases.

What is special about the data in context-based knowledge spaces? Archivist's primary goal with all activities they undertake is to ensure that the records, resources and information sources under their control last for a very long time. Archivists tend to eschew the flashy and the fashionable in their search for sustainability. Bright Sparcs, true to this tradition, is simple, honest and straight forward - a simplicity that belies the depth of thought that has gone into its development. It has survived well through the initial years of the web and has developed a user group that spans the world. Context-based information is ideal for this sort of purpose as it documents, through the creation of web surrogates, entities that represent things that actually existed. People, families and corporate bodies exist in time and space, they are usually named and performed a variety of activities. They are also the primary focus of interest for a significant majority of inquirers.

Context-based entities in this environment become datum points to which can be mapped a wide variety of variables. New resources and information sources are regularly created and knowledge held by specialists is constantly appearing. Bright Sparcs has been constructed to encourage users to contribute their knowledge to the site. In this way it is truly interactive and a vital component in creating a vibrant online community.

The next major development of Bright Sparcs context entities will be their encoding in SGML/XML. The creation of a Document Type Definition that will enable the structured presentation of data relating to context entities on the Web and the identification and retrieval of specific and meaningful data elements will be a major advance. If we, as an archival community, are able to establish an internationally recognised DTD for context entities, it will be possible to create local, national and internationally linked context-based web spaces.

Although Bright Sparcs has arisen from an archival background it is not specific to archival resources and thus has a major role in unifying access to cultural heritage resources distributed across museums, archives, libraries and other sites. The conceptual framework that underpins the context-based web space is also being explored in other areas. In particular in relation to the long-term provision of government information services on the web, as a key element in localised knowledge management and electronic records management systems, and in the development of the Dublin Core metadata standard version 2. [20]

 

The New Haven Initiative

The importance of provenance, or more generally context, as a unifying force has been identified for sometime in the international arena by leading thinkers. [21] Some of these have played critical roles in the creation of ISAAR(CPF) and are now leading the development of an international collaborative research project to be placed before the USA National Science Foundation's International Digital Libraries Funding Program. [22] An inaugural planning committee comprising three people from Europe, nine from the USA, one from Canada and one from Australia (the author) met in Yale, New Haven on 4-6 December 1998 to formulate the scope of the project over the next 3-5 years. The draft meeting report stated:

"The meeting participants decided that further work was warranted and necessary to identify and promulgate a professional consensus on the structure, elements, and functionality of contextual information within an archival information system. They identified the following projects and actions that needed to be undertaken to advance understanding and consensus and to coordinate work throughout the international archival community:

1. Establish an ad hoc international coordinating body to track the work being done on contextual information by various individuals and organizations and to serve as a clearinghouse for communications with other interested members of the profession.

2. Develop a preliminary SGML/XML-compliant document type definition (DTD) for contextual information, based on elements identified in ISAAR(CPF). This DTD should also be compatible with the structure and purposes of the Encoded Archival Description DTD.

3. Implement the preliminary DTD on an operational system that can serve as a testbed for experimentation and analysis.

4. Identify and map existing sources of contextual information against the preliminary DTD to test the comprehensiveness and structure of the ISAAR elements.

5. Investigate and define the types of products that the DTD should support and test the DTD to ensure that such products can be generated from instances. Examples given included organizational charts and genealogies.

6. Identify and define other types of specialized contextual data systems (such as geographic information systems and government locator systems) to determine whether and how they should be linked to a contextual information DTD. In particular, the group was interested in how genealogical information recorded in the GEDCOM data structure might be represented in an SGML/XML DTD and made interoperable with the contextual information DTD.

7. Conduct user studies to determine whether and how contextual information supports the research process of various categories of users.

8. Investigate the elements, structure, and capabilities needed to record descriptions of organizational functions and personal activities. Investigate whether functional descriptions should be accommodated within a contextual information DTD that also includes descriptions of organizations, persons, and families, or whether a separate definition is needed.

9. Develop a standard list of relationship types which would link entities described in a contextual information DTD. Such relationships would include those between similar entities (e.g., relationships between organizations) and those between dissimilar entities (e.g., relationships between persons and organizations).

10. Investigate the concept of organizational change to develop guidelines for determining the conditions under which a change in the characteristics of an entity are sufficient to require a new instance for the changed entity. As an example, should a name change for a government agency that does not involve any substantive changes in its functional responsibilities require a new contextual information instance for that entity? Conversely, would a significant change in function without a name change trigger such an action?

11. Following development and testing of the DTD, develop a Z39.50 attribute set for both the contextual information and the Encoded Archival Description document type definitions." [23]

 

Conclusion

Australia is currently placed in a unique position to capitalise on the confluence of the research, development and implementation of a variety of information-based industries. which include the archives, museum and built heritage sectors. Indeed, the computer and information technology industries, and the multi-media learning technologies industries are all involved in significant and indeed revolutionary change at the present time. New visions for the future flourish and metadata is all the rage but the issues of the documentation, preservation and management of context in a complex communication network such as the World Wide Web remain problematic. However, within the heritage industry a new understanding is emerging that will enable the systematic, structured and distributed management of context to radically improve the management of virtual and physical resources both across space and through time.

The encoding of context entities for Web functionality, based on the documentation metadata standard promulgated by the International Council on Archives, ISAAR(CPF), which when coupled with applications of Extensible Markup Language (XML) such as the Resource Description Framework (RDF) [24] developed under the auspices of the World Wide Web Consortium (W3C), [25] provides a means by which we can reach a new level of utility. It is a new level that will only be achieved through conscious and deliberate engineering based on collaborative research, development and implementation projects that will lead to common conventions of semantics, syntax and structure.

The Australian archival community has a unique opportunity to take a leading role based on the conceptual understanding established through the development and implementation of the series system. ASAP hopes to be a leading collaborator in these developments and is keen to use Bright Sparcs as a testbed for SGML/XML encoding and functionality. Contact has already been made with software developers to look at the use of Bright Sparcs HTML and ultimately XML encoded context entity data in multi-media, data visualisation and analytical products. ASAP itself is looking to develop the database management tools under the open software development concept with the aim of facilitating the introduction of context-based web spaces across the full spectrum of the heritage industry. What is most important is that we have the opportunity to develop and implement national and global context registration and mapping that will facilitate the publishing and sharing of knowledge by all who wish to engage with the World Wide Web.

 

Acknowledgments

I would like to thank the fantastic staff of the Heritage Centre for Australian Science, Technology and Medicine, University of Melbourne, in particular, Joanne Evans, Robin Stephens, Helen Morgan and Fay Anderson for their input into this paper.

 

References

[1] In 1998 the Australian Archives was renamed the National Archives of Australia.

[2] Scott, P.J., et al, 'Archives and Administrative Change - Some Methods and Approaches', Archives and Manuscripts, vol. 7 no. 1, August 1978, pp. 115-127; vol. 7 no. 2, April 1979, pp. 151-165; vol. 8 no. 1, June 1980, pp. 41-54; vol. 8 no. 2, December 1980, pp. 51-69; and vol. 9 no. 1, September 1981, pp. 3-17.

[3] The 'continuum' concept was articulated in the 1990s by Sue McKemmish and Frank Upward at Monash University following their rejection of the 'life-cycle' metaphor as inappropriate for dealing with the realities of recordkeeping. Information about the 'continuum' metaphor can be found at: <http://www.sims.monash.edu.au/rcrg/index.html [HREF2] >.

[4] Cunningham, Adrian, National Archives of Australia - personal communication on his return from Washington DC, November 1998.

[5] Information about the Records Continuum Research Group at Monash University can be found at: <http://www.sims.monash.edu.au/rcrg/index.html [HREF3] >.

[6] McKemmish, Sue, Records Continuum Research Group, Monash University - personal communication - email, 30 November 1998.

[7] Pitti, Daniel V., 'Encoded Archival Description: The Development of an Encoded Standard for Archival Finding Aids', American Archivist, vol. 60 Summer 1997, pp. 268-283. Online information about this work can be found at: <http://lcweb.loc.gov/ead/ [HREF4] >.

[8] Hurley, Chris, 'The Making and the Keeping of Records: (1) What are Finding Aids For?' Archives and Manuscripts, vol. 26 no. 1, pp. 74 and 75.

[9] The International Council on Archives standard for archival authority records, ISAAR(CPF), can be found on the World Wide Web at: <http://www.archives.ca/ica/cds/isaar_e.html [HREF5] >, the related ISAD(G) document can be located through the same site.

[10] Australian Museums Online (AMOL) is a leading example to high quality online services provided in this area. They can be found at <http://amol.org.au [HREF6] >.

[11] Information about the UNESCO University and Heritage Forum No. 3 1998 can be found at: <http://arts.deakin.edu.au/unesco/ [HREF7] >.

[12] McCarthy, Gavan, 'The University as a locus of authority in linking heritage resources', Proceedings of the Third International Forum UNESCO: University and Heritage, Deakin University, Melbourne and Geelong, Australia, 4-8 October 1998 (in press).

[13] Australian Society of Archivists Inc., Directory of Archives in Australia, 1998 Edition, First published online May 1996, ISBN 0 947219 129, see <http://www.asap.unimelb.edu.au/asa/directory/asa_dir.htm [HREF8] >.

[14] Michard, Alain and Giang Pham-Dac, 'Description of Collections and Encyclopaedias on the Web using XML', Archives and Museums Informatics, vol. 12, pp. 39-79, 1998, Kluwer Academic Publishers, Netherlands.

[15] Information about the Australian Science Archives Project, University of Melbourne can be found on the World Wide Web at: <http://www.asap.unimelb.edu.au [HREF9] >.

[16] McCarthy, Gavan, The Engine of Change: the development of RASA, a new archival tool for improving access to the archives of science and technology in Australia, MA thesis, Department of Librarianship, Archives and Records, Monash University, April 1994.

[17] For further information on Excite software and services see the Excite Inc. Site at: <http://www.excite.com/ [HREF10] >.

[18] For further information on Harvest Web Indexing see:<http://www.tardis.ac.uk/harvest/ [HREF11] >.

[19] For further information explore the Museum Victoria, Ed-Online, Bioinformatics web space at: <http://www.mov.vic.gov.au/bioinformatics [HREF12] >.

[20] Bearman, D. et al, 'A Common Model to Support Interoperable Metadata: Progress report on reconciling metadata requirements from Dublin Core and INDECS/DOI Communities', D-Lib Magazine, January 1999, vol. 5 no. 1, ISSN 1082-9873, at <http://www.dlib.org/dlib/january99/bearman/01bearman.html [HREF13] >.

[21] Bearman, D., and R.H. Lytle, 'The Power of the Principle of Provenance', Archivaria, vol. 21, Winter 1985-86, pp. 14-27.

[22] Further information on the USA National Science Foundation's International Digital Libraries Funding Program can be found at: <http://www.nsf.gov/pubs/1999/nsf996/nsf996.htm [HREF14] >.

[23] Szary, Richard, Internal Yale Working Group communication - email, 7 February 1999.

[24] Miller, Eric, 'An Introduction to the Resource Description Framework', D-Lib Magazine, May 1998, ISSN 1082-9873, at: <http://www.dlib.org/dlib/may98/miller/05miller.html [HREF15] >.

[25] Information about the World Wide Web Consortium can be found at: <http://www.w3.org/ [HREF16] >.

Hypertext References

HREF1
http://asap.unimelb.edu.au/staff/gjmbiog.htm
HREF2
http://www.sims.monash.edu.au/rcrg/index.html
HREF3
http://www.sims.monash.edu.au/rcrg/index.html
HREF4
http://lcweb.loc.gov/ead/
HREF5
http://www.archives.ca/ica/cds/isaar_e.html
HREF6
http://amol.org.au
HREF7
http://arts.deakin.edu.au/unesco/
HREF8
http://www.asap.unimelb.edu.au/asa/directory/asa_dir.htm
HREF9
http://www.asap.unimelb.edu.au
HREF10
http://www.excite.com/
HREF11
http://www.tardis.ac.uk/harvest/
HREF12
http://www.mov.vic.gov.au/bioinformatics
HREF13
http://www.dlib.org/dlib/january99/bearman/01bearman.html
HREF14
http://www.nsf.gov/pubs/1999/nsf996/nsf996.htm
HREF15
http://www.dlib.org/dlib/may98/miller/05miller.html
HREF16
http://www.w3.org/


Copyright

Gavan McCarthy, © 1999. The author assigns to Southern Cross University and other educational and non-profit institutions a non-exclusive licence to use this document for personal use and in courses of instruction provided that the article is used in full and this copyright statement is reproduced. The author also grants a non-exclusive licence to Southern Cross University to publish this document in full on the World Wide Web and on CD-ROM and in printed form with the conference papers and for the document to be published on mirrors on the World Wide Web.


[ Proceedings ]
AusWeb99, Fifth Australian World Wide Web Conference, Southern Cross University, PO Box 157, Lismore NSW 2480, Australia Email: "AusWeb99@scu.edu.au"