Kent Fitch, Project Computing Pty Ltd. Email: Kent.Fitch@ProjectComputing.com
web site archiving, web site harvesting, web server filters, managing electronic records
However, whilst traditional records management, change control and versioning systems potentially address the problem of tracking updates to content, in practice, web responses are increasingly being generated dynamically: pages are constructed on the fly from a combination of sources including databases, feeds, script output and static content using dynamically selected templates, stylesheets and output filters and often with per-user "personalisation". Furthermore, the content types being generated are steadily expanding from HTML text and images into audio, video and applications.
Under such circumstances, being able to state with confidence exactly what a site looked like at a given date and exactly what responses have been generated and how and when those responses changed becomes extremely problematic.
This paper discusses an approach to capturing and archiving all materially distinct responses produced by a web site, regardless of their content type and how they are produced. This approach does not remove the need for traditional records management practices but rather augments them by archiving the end results of changes to content and content generation systems. It also discusses the applicability of this approach to the capturing of web sites by harvesters.
[ Full Paper ] [ Presentation ] [ Proceedings ] [ AusWeb Home Page ]