Naomi Norman, Research and Consultancy, Key Centre for Human Factors and Applied Cognitive Psychology [HREF1] , University of Queensland [HREF2], St Lucia, Queensland, 4072. naomi@humanfactors.uq.edu.au
This paper introduces the WorkingWeb [HREF3] suite of usability engineering applications. WorkingWeb is a suite of software tools developed by the Key Centre for Human Factors and Applied Cognitive Psychology at the University of Queensland to assist in the enhancement and evaluation of website usability. There are two products in the WorkingWeb suite; the WorkingWeb Usability Testing Tool [HREF4] and the WorkingWeb Information Architecture Tool [HREF5].
The WorkingWeb Usability Testing Tool (previously named PRISM Browser) has been used in a number of commercial developments and research projects (Dennis et al, 2002, Dennis et al, 1998, Bruza and Dennis, 1997, Bruza et al, 1997) comparing the usability of multiple websites. The Browser is used both to control the experimental procedure presented to participants and to record the participants' actions as they complete the experiment.
The WorkingWeb Information Architecture Tool is used to measure participants' (typically representative website users) perceptions of the similarity of a site's content and, through cluster analysis of this data, to create an information architecture for the site.
Usability is the extent to which a computer system enables users, within a specified context of use, to achieve specified goals effectively and efficiently while promoting feelings of satisfaction (ISO9241).
Websites are now used for a plethora of activities including:
The most common use of websites is undoubtedly that of information retrieval. In this context the usability of a website can then be defined as the extent to which the site enables users to effectively and efficiently find information.
The costs of poor usability to organisations include lost productivity as employees spend more time searching for information, the cost of providing call centers for users who have given up trying to use the website, the cost of training staff in how to use a site, and the negative image of the organization that users take away from the site that thwarts their attempts to find information.
The literature is filled with differing descriptions of usability testing, but there are some basic principles generally apply. (adapted from Rubin et al, 1994)
Recent studies have cast doubt on the reliability of this evaluation technique by showing that different usability analysts produce very different results when testing the same interface. Two comparative usability evaluation (CUE) studies in particular demonstrated the differences between evaluations carried out by independent usability teams (CUE1, Mollich et al 1998, and CUE2, Mollich et al, 1999).
In CUE1 four commercial usability teams evaluated the same calendar program through tests with (about) five users and then prepared a report listing the usability problems identified. Only one problem was identified by all teams, and more than ninety-percent of problems were identified by just one team.
CUE2 was a replication of the first study involving seven independent teams evaluating Hotmail (http://www.hotmail.com). Again each team conducted usability tests with around five users and submitted a report of usability problems. Fifty-five percent of all problems were found only by one team. Both studies demonstrated less than a 1% overlap between problems identified by different teams.
In another follow-up study Kessner et al 2001) six independent teams conducted usability tests of the same system. This study attempted to improve the amount of overlap between groups by limiting the issues to be addressed, eliminating non-usability issues (such as those related to marketing), grouping the identified problems into categories rather than considering them individually and using only professional teams (no students). After categorisation the researchers gathered a list of thirty-six usability problems. Again there was very little overlap with 44% percent of problems being identified by only one team and no problems identified by all teams.
The findings of these studies show that there is indeed room for improvement in current usability evaluation techniques. Principles of psychological experimentation provide possible explanations for the low reliability of usability evaluation techniques.
As the WorkingWeb Usability Testing Tool was developed for the purposes of psychological research into website usability it incorporates features designed to address the above issues.
The Tool can be programmed to control all aspects of the test procedure. Typically this involves delivery of instructions, situating the participant on the site homepage at the beginning of a task and recording the participant's actions. The Tool records which links are selected, the amount of time spent on each page and the amount of time required to complete the task. The software can either be used in-house or downloaded by participants and run remotely in their own home or office with a results file returned automatically at the end of the test.
Figure 1 shows the browser interface which test participants see while completing a test. This is a screenshot taken from a recent usability test of the Melbourne City Council website. The first line on menu bar contains the standard navigation buttons 'Back', 'Forward', 'Refresh' and 'Home' (the latter is configurable and typically set to the homepage of the test website) as well as 2 buttons specific to the test requirements; 'Finished' and 'Continue'. The labels and behaviour of these buttons is configurable. 'Finished' is typically used by the participant to indicate that they completed the current task and 'Continue' is used in other instances where the participant needs to continue on to the next phase of the test but is not necessarily completing a task (for example after reading instructions). The second line on the menu bar can be configured to display the task instructions (as shown).
The software collates the results of all participants and calculates the following measures:
By way of example, the uncertainty measures from the test mentioned above are shown in Figure 2.

This software addresses the above issues associated with other methods through:
Information architecture is the structural design of an information space designed to facilitate task completion and intuitive access to content (Rosenfeld, 2002). It incorporates the combination of organisation, labeling and navigation schemes.
The information architecture of a website has a profound impact on a user's ability to navigate the site in order to find the information they seek. The optimal structure for a website would allow all users to effectively and efficiently locate the information they seek, while a less than optimal structure would place obstacles in the users' path leading to inefficiency, a lower chance of success and frustration.
Many websites are categorised in part or whole according to the internal structure of the organisation, grouping documents according to the staff that are responsible for producing them. Unfortunately, most website users are unfamiliar with the inner workings of the organisation, and are unlikely to find such sites easy to navigate. Web developers and information architects are often called upon to produce a more meaningful categorisation and rely on a variety of classification schemes (Gullikson et al., 1999, p. 294; Rosenfeld & Morville, 1998) including:
Information architects come from backgrounds as diverse as graphic design, information and library science, usability engineering, marketing, computer science and technical writing (Rosenfeld, 2002). Professional backgrounds, individual differences and organizational objectives lead to differing perspectives regarding the best structural solution for a given information space, and there is no guarantee that the finished product will be easily discernible to users.
In an extensive study into website usability Spool et al (1999) found that a major cause of problems in navigation was that the structure did not meet users' expectations. User-oriented design approaches aim to satisfy users' expectations through involving users in the design process.
One user-oriented approach to information architecture design involves gathering data on users' perceptions of the information contained within a site and creating a structure that incorporates this data (Neilsen and Sano, 1994). Thus when users visit the site the actual sitemap closely resembles the expected map. One method by which this data can be gathered is topic sorting. In a topic sorting exercise participants are presented with a list of topics representing the conceptual chunks of information from the site and asked to sort the topics according to their similarity. (This procedure is often referred to as card sorting due to the fact that it is most often completed using paper-cards bearing the topic names.)
Every user has a differing perspective of the organization and its information. Each participant in a topic sorting exercise may produce a somewhat different categorisation. These differences can be attributed to either idiosyncratic views of the world or to different systems of categorisation. By way of example Table 1 represents the way three fictitious participants would sort the topics 'cats', 'lions', dogs', 'elephants', 'eagles' and 'budgerigars'. The differences between participants 1 and 2 represent different systems of categorisation: mammal versus bird and domestic versus wild. The difference between participants 2 and 3 represents an idiosyncratic view, ie. inclusion of 'eagle' in the 'domestic' rather than 'wild' category.
Table 1. Example of Alternate Categorisations
| Participant | 1 Participant 2 | Participant 3 | |
| Cats | Mammal | Domestic | Domestic |
| Lions | Mammal | Wild | Wild |
| Dogs | Mammal | Domestic | Domestic |
| Elephants | Mammal | Wild | Wild |
| Eagles | Bird | Wild | Domestic |
| Budgerigar | Bird | Domestic | Domestic |
How should site designers choose between these different options? Some website designers have simply observed the topic sorting results of a small number of test participants (Nielsen and Sano, 1994), and somehow determined what they believed to be the common theme from the competing structures. Since this method relies heavily on designer interpretation it is to some extent subject to the same concerns as information architecture design involving no user participation. It is also unmanageable with more than a handful of participants and therefore vulnerable to sampling bias.
Cluster analysis is a promising method for making sense of topic sorting data from multiple participants. Cluster analysis is a method used to uncover the pattern or structure contained within proximity data (Everitt & Rabe-Hesketh, 1997, p1) and, from this, to produce a hierarchical structure. In the case of website design, the proximity data are the strengths of the similarity relationships between topics. Since very similar topics tend to be sorted together by many participants, and dissimilar topics are rarely if ever sorted together, a similarity measure for each pair of topics is calculated by summing the number of participants who sorted the pair in a common group.
The WorkingWeb Information Architecture Tool combines a 'tester' interface through which users complete a topic sorting activity and a separate 'client' interface through which cluster analysis can be performed on the topic sorting data.
The method of cluster analysis used in the WorkingWeb information Architecture Tool combines a traditional centroid hierarchical clustering technique with an optimisation algorithm. Centroid hierarchical clustering is an iterative process whereby clusters of items are built up by combining two items or clusters at a time. On the first iteration the two most similar items (the pair with the highest similarity rating) are fused to form a cluster. Once a cluster is formed the positions of the cluster is represented by the mean values of its constituent items, mean vector, and the individual positions of constituent items is thenceforth ignored. The distance between clusters is defined as the distance between their mean vectors, and the distance between a cluster and an unclustered item is the distance between the item position and the cluster mean vector (Everitt, 1993). On the second iteration the next two most similar items are joined to form a second cluster and so on until a hierarchical structure (dendogram) is formed.
For practical purposes we are not interested in the complete hierarchy produced by the cluster analysis as the number of partitions is too great. With only two options at each node a large website would take hours to navigate so a subset of partitions must be selected. An appropriate number of clusters for the data is arrived at through an optimisation algorithm. The subject of optimisation is the ratio of within-category to between-category inter-item similarities, ensuring that similar items of information are categorised together while dissimilar items are categorised separately.
Bruza, P., McArthur, R., & Dennis, S. (2000). Interactive internet search: Keyword, directory and query reformulation mechanisms compared. Special Interest Group on Information Retrieval (SIGIR).
Bruza, P., & Dennis, S. (1997). Query reformulation on the Internet: Empirical data and the hyperindex search engine. Proceedings of the RIAO97 Conference - Computer Assisted Information Searching on the Internet, Centre de Hautes Etudes Internationales d'Informatique Documentaires, 488-499.
Dennis, S., Bruza, P., & McArthur, R. (2002). Web searching: A process oriented experimental study of three interactive search paradigms. Journal of the American Society for Information Sciences and Technology 53(2): 120-133.
Dennis, S., McArthur, R., & Bruza, P. D. (1998). Searching the World Wide Web made easy? The cognitive load imposed by query reformulation mechanisms. Proceedings of the Twentieth Annual Conference of the Cognitive Science Society, p 1214, Lawrence Erlbaum Associates.
Everitt, B. (1993). Cluster Analysis. New York: John Wiley & Sons Inc.
Everitt, B. and Rabe-Hesketh, s. (1997). The Analysis of Proximity Data. New York: John Wiley & Sons Inc.
Gullikson, S., Blades, R., Bragdon, M., McKibbon, S., Sparling, M. and Toms, E. (1999). The impact of information architecture on academic website usability. The Electronic Library. Vol. 17, No. 5
Kessner, M., Wood, J., Dillon, R. F., & West, R. L. (2001). On the reliability of usability testing. CHI 2001 Extended Abstracts, ACM Press, 97-98.
Kiger, J.I., (1984). The depth/breadth tradeoff in the design of menu-driven interfaces. International Journal of Man-Machine Studies, 20, 201-213.
Larson, K and Czerwinski (1998). Web page design: Implications of memory, structure and scent from information retrieval. Proceedings of the Association for Computing Machinery's Computer Human Interaction Conference, 18-23.
Mollich, R., Thomsen, A.D., Karyukina, B., Schmidt, L., Ede, M., Oel, W.V. and Arcuri, M. (1999), Comparative evaluation of usability tests, CHI 99 Extended Abstract, 83-84
Mollich, R., Bevan, N., Curson, I., Butler, S., Kindlund, E., Miller, D., Kirakowski, J. (1998), Comparative evaluation of usability tests, Proceedings of the Usability Professionals Association
Nielsen, J., and Sano, D. (1994). SunWeb: User interface design for Sun Microsystem's internal web. Proceedings of the 2nd World Wide Web Conference '94: Mosaic and the Web.
Rosenfeld, L.B. (2002). Information Architecture for the World Wide Web (2nd Edition) Cambridge: O'Reilly
Rosenfeld, L. & Morville, P. (1998). Information Architecture for the World Wide Web. Cambridge: O'Reilly
Rubin, J. (1994). Handbook of usability testing: how to plan, design and conduct effective tests. New York: John Wiley & Sons Inc.
Shannon, C. (1951) Predictions and Entropy of Printed English, Bell Systems Tech. Vol. 30, No. 50
Spool, J.M., Scanlon, T., Schroeder, W., Snyder, C. and DeAngelo, T. (1999). Web Site Usability: A Designer's Guide. San Francisco: Morgan Kauffman Publishers Inc.
Wurman, R.S. (1996). Information Architects. Zurich: Graphic Press