Karen Taylor [HREF1], Client Communications Officer, Client Services, Information Technology Services [HREF2], Building 28, Monash University [HREF3], Victoria, 3800. Email: Karen.Taylor@its.monash.edu.au
Within the Information Technology Services Division of Monash University, a content inventory was undertaken in preparation for movement into the universities content management system.
This exhaustive task was soon taking its toll on the staff involved. Surely there must be a faster way of doing this?
Guess what, there was. It wasn't completely fool proof but it certainly cut down this time consuming task, yet still giving us the data we needed.
In early 2004 we began the preparatory work for moving our website into the Monash University content management system. As part of this process we needed to review our website content. What information was actually there? How well was it organized?
Our website had grown exponentially since its implementation in 2000, and we had no real idea of the information in some directories as they were being produced and maintained by other areas within the department.
To get the real picture of our website content and structure we needed to complete a thorough content inventory.
A content inventory is the process of stepping through each page of your website, and recording all the relevant information about every page. We were advised that although the inventory itself is easy to do, for large websites it is a very time consuming process.
Our website is very large. With over 28,000 web pages to review, we knew this was not going to be the easiest of tasks. We divided our website into sections and began clicking though the site, recording each page and its details into the inventory spreadsheet. We recorded the following details for each page of our website:

(Fig 1. Section of our content inventory spreadsheet)
After a few hours of reviewing our site we began to realise how long this process was actually going to take us. At first we thought it would take about 2-3 weeks, but we soon realised it was going to take much longer.
Before we began the audit we were warned that it was time consuming. But as well as the time it was taking, we found process of cutting and pasting information into the spreadsheet very frustrating. The constant flicking between the browser and the spreadsheet did not give us the change to build up a momentum to just get the job done. It was also very difficult to concentrate on reviewing the content on the page when you were in the midst of a cut and paste frenzy.
We asked our selves on many occasions, “Is this really worth the trouble?” Not only were we loosing our minds, the process itself was making us depressed. After about two weeks of the inventory I was ready to tear my hair out. And I know my colleague felt the same. How could we go on? We were not even one quarter of the way through.
Surely someone else must have had this trouble before? Surely someone has software to do it for us? Unfortunately the answer seemed to be no! We knew that we couldn’t get a computer program to evaluate each page of our website, but we didn’t want or need that. We just needed something that would fill in the basic details for us - the URL, page title, keywords. These cutting and pasting tasks were taking up the majority of our time and making the review unbearable.
Eureka! Finally we stumbled across a small utility called Extract URL. This software was not designed for the purpose of a content inventory, yet we can use it for that purpose. When using this utility you can retrieve information about a web page including the URL, page title, description, keywords meta data, date modified and page size from an entire website. Proxy support and the ability to access password-protected sites are also available. Once the data was gathered from the website it could be saved in numerous formats including as an excel file.
The Utility worked by following links through an entire website or directory. As our server administrator had set up a virtual copy of our website with directory listing enabled, we were able to also find files that were not linked.
(Fig 2. Extract URL software package)
We were thrilled to find something that would help us speed up the process of our inventory. And it really did just that! We continued going through each section of our website, but this time we used the program to extract the data into the spreadsheet first. We were then able to go back and examine the content in more detail.
We now had more time to sit down and go through each page looking at what was important – the content, and the ability to evaluate it with a fresh mind was a godsend.
But of course nothing in life is perfect. The software is restricted to text-based files, such as web pages (.html, .htm). It does not display the details of any binary files. It does however put in a few strange file names here and there that appear to be where the program has encountered something it doesn’t understand. So we needed to keep an eye out when going through the results that we didn’t miss anything of importance.
Completing our content inventory was a long yet worthwhile journey. We were able to identify many stray files and directories that have now been removed from the website. We now also have a good understanding of the content we have and the work involved in improving it.
Although the software we found to assist with our audit wasn’t perfect, we believe it cut down our inventory time by half.
Graser, Janice Crotty (2001). “Taking A Content Inventory” in New
architect.
Available online at: [HREF4]
Veen, Jeffrey (2002). “Doing a Content Inventory (Or, A Mind-Numbingly
Detailed Odyssey Through Your Web Site)” in Adaptive Path.
Available online at: [HREF5]