WorkingWeb: Website Usability Engineering Software

Naomi Norman, Research and Consultancy, Key Centre for Human Factors and Applied Cognitive Psychology [HREF1] , University of Queensland [HREF2], St Lucia, Queensland, 4072. naomi@humanfactors.uq.edu.au

Abstract

This paper introduces the WorkingWeb [HREF3] suite of usability engineering applications. WorkingWeb is a suite of software tools developed by the Key Centre for Human Factors and Applied Cognitive Psychology at the University of Queensland to assist in the enhancement and evaluation of website usability. There are two products in the WorkingWeb suite; the WorkingWeb Usability Testing Tool [HREF4] and the WorkingWeb Information Architecture Tool [HREF5].

Introduction

The WorkingWeb Usability Testing Tool (previously named PRISM Browser) has been used in a number of commercial developments and research projects (Dennis et al, 2002, Dennis et al, 1998, Bruza and Dennis, 1997, Bruza et al, 1997) comparing the usability of multiple websites. The Browser is used both to control the experimental procedure presented to participants and to record the participants' actions as they complete the experiment.

The WorkingWeb Information Architecture Tool is used to measure participants' (typically representative website users) perceptions of the similarity of a site's content and, through cluster analysis of this data, to create an information architecture for the site.

Usability Testing

Usability is the extent to which a computer system enables users, within a specified context of use, to achieve specified goals effectively and efficiently while promoting feelings of satisfaction (ISO9241).

Current State of Usability Testing

Websites are now used for a plethora of activities including:

The most common use of websites is undoubtedly that of information retrieval. In this context the usability of a website can then be defined as the extent to which the site enables users to effectively and efficiently find information.

The costs of poor usability to organisations include lost productivity as employees spend more time searching for information, the cost of providing call centers for users who have given up trying to use the website, the cost of training staff in how to use a site, and the negative image of the organization that users take away from the site that thwarts their attempts to find information.

The literature is filled with differing descriptions of usability testing, but there are some basic principles generally apply. (adapted from Rubin et al, 1994)

  1. Specific tasks or site objectives are identified.
  2. A representative sample of website users is gathered.
  3. Ideally the test takes place in an environment similar to the users' actual work environment.
  4. The users use or review the website while a test moderator monitors their performance.
  5. The moderator interrogates the participants' beliefs and impressions of the system.
  6. Quantitative (usually number of clicks and time taken) and qualitative (user preference ratings) are recorded.
  7. The usability analyst identifies usability problems or recommended improvements to the site.

Recent studies have cast doubt on the reliability of this evaluation technique by showing that different usability analysts produce very different results when testing the same interface. Two comparative usability evaluation (CUE) studies in particular demonstrated the differences between evaluations carried out by independent usability teams (CUE1, Mollich et al 1998, and CUE2, Mollich et al, 1999).

In CUE1 four commercial usability teams evaluated the same calendar program through tests with (about) five users and then prepared a report listing the usability problems identified. Only one problem was identified by all teams, and more than ninety-percent of problems were identified by just one team.

CUE2 was a replication of the first study involving seven independent teams evaluating Hotmail (http://www.hotmail.com). Again each team conducted usability tests with around five users and submitted a report of usability problems. Fifty-five percent of all problems were found only by one team. Both studies demonstrated less than a 1% overlap between problems identified by different teams.

In another follow-up study Kessner et al 2001) six independent teams conducted usability tests of the same system. This study attempted to improve the amount of overlap between groups by limiting the issues to be addressed, eliminating non-usability issues (such as those related to marketing), grouping the identified problems into categories rather than considering them individually and using only professional teams (no students). After categorisation the researchers gathered a list of thirty-six usability problems. Again there was very little overlap with 44% percent of problems being identified by only one team and no problems identified by all teams.

The findings of these studies show that there is indeed room for improvement in current usability evaluation techniques. Principles of psychological experimentation provide possible explanations for the low reliability of usability evaluation techniques.

  1. The small sample sizes used in testing are insufficient for statistically significant results.
  2. No specific measures of usability are identified. While tasks are frequently timed, the final assessment of usability problems is decided upon inferentially, allowing for subjectivity of the usability analyst.
  3. Interrogation of the test participants' impression of the system requires introspection of the causes behind their actions and decisions. Nisbett et al (1977) revealed that there are four types of limitations of self-report which frequently prevent us from making reliable introspections.
  4. People may be unaware of the existence of their responses produced by experimental manipulation.
  5. People may be unaware that a cognitive process has occurred.
  6. People are often unable to identify the existence of a stimulus which was critical in producing a response.
  7. Even when people are aware of both the stimulus and response, they may be unable to describe the effect of the stimulus on the response.

WorkingWeb Usability Testing Tool

As the WorkingWeb Usability Testing Tool was developed for the purposes of psychological research into website usability it incorporates features designed to address the above issues.

The Tool can be programmed to control all aspects of the test procedure. Typically this involves delivery of instructions, situating the participant on the site homepage at the beginning of a task and recording the participant's actions. The Tool records which links are selected, the amount of time spent on each page and the amount of time required to complete the task. The software can either be used in-house or downloaded by participants and run remotely in their own home or office with a results file returned automatically at the end of the test.

Figure 1 shows the browser interface which test participants see while completing a test. This is a screenshot taken from a recent usability test of the Melbourne City Council website. The first line on menu bar contains the standard navigation buttons 'Back', 'Forward', 'Refresh' and 'Home' (the latter is configurable and typically set to the homepage of the test website) as well as 2 buttons specific to the test requirements; 'Finished' and 'Continue'. The labels and behaviour of these buttons is configurable. 'Finished' is typically used by the participant to indicate that they completed the current task and 'Continue' is used in other instances where the participant needs to continue on to the next phase of the test but is not necessarily completing a task (for example after reading instructions). The second line on the menu bar can be configured to display the task instructions (as shown).

The software collates the results of all participants and calculates the following measures:

By way of example, the uncertainty measures from the test mentioned above are shown in Figure 2.

This software addresses the above issues associated with other methods through:

Information Architecture

Information architecture is the structural design of an information space designed to facilitate task completion and intuitive access to content (Rosenfeld, 2002). It incorporates the combination of organisation, labeling and navigation schemes.

The information architecture of a website has a profound impact on a user's ability to navigate the site in order to find the information they seek. The optimal structure for a website would allow all users to effectively and efficiently locate the information they seek, while a less than optimal structure would place obstacles in the users' path leading to inefficiency, a lower chance of success and frustration.

Common Styles of Website Information Architecture

Many websites are categorised in part or whole according to the internal structure of the organisation, grouping documents according to the staff that are responsible for producing them. Unfortunately, most website users are unfamiliar with the inner workings of the organisation, and are unlikely to find such sites easy to navigate. Web developers and information architects are often called upon to produce a more meaningful categorisation and rely on a variety of classification schemes (Gullikson et al., 1999, p. 294; Rosenfeld & Morville, 1998) including:

Information architects come from backgrounds as diverse as graphic design, information and library science, usability engineering, marketing, computer science and technical writing (Rosenfeld, 2002). Professional backgrounds, individual differences and organizational objectives lead to differing perspectives regarding the best structural solution for a given information space, and there is no guarantee that the finished product will be easily discernible to users.

User-Oriented Approaches

In an extensive study into website usability Spool et al (1999) found that a major cause of problems in navigation was that the structure did not meet users' expectations. User-oriented design approaches aim to satisfy users' expectations through involving users in the design process.

Topic Sorting

One user-oriented approach to information architecture design involves gathering data on users' perceptions of the information contained within a site and creating a structure that incorporates this data (Neilsen and Sano, 1994). Thus when users visit the site the actual sitemap closely resembles the expected map. One method by which this data can be gathered is topic sorting. In a topic sorting exercise participants are presented with a list of topics representing the conceptual chunks of information from the site and asked to sort the topics according to their similarity. (This procedure is often referred to as card sorting due to the fact that it is most often completed using paper-cards bearing the topic names.)

Interpreting Topic Sorting Results

Every user has a differing perspective of the organization and its information. Each participant in a topic sorting exercise may produce a somewhat different categorisation. These differences can be attributed to either idiosyncratic views of the world or to different systems of categorisation. By way of example Table 1 represents the way three fictitious participants would sort the topics 'cats', 'lions', dogs', 'elephants', 'eagles' and 'budgerigars'. The differences between participants 1 and 2 represent different systems of categorisation: mammal versus bird and domestic versus wild. The difference between participants 2 and 3 represents an idiosyncratic view, ie. inclusion of 'eagle' in the 'domestic' rather than 'wild' category.

Table 1. Example of Alternate Categorisations
 Participant 1 Participant 2 Participant 3
Cats Mammal Domestic Domestic
Lions Mammal Wild Wild
Dogs Mammal Domestic Domestic
Elephants Mammal Wild Wild
Eagles Bird Wild Domestic
Budgerigar Bird Domestic Domestic

How should site designers choose between these different options? Some website designers have simply observed the topic sorting results of a small number of test participants (Nielsen and Sano, 1994), and somehow determined what they believed to be the common theme from the competing structures. Since this method relies heavily on designer interpretation it is to some extent subject to the same concerns as information architecture design involving no user participation. It is also unmanageable with more than a handful of participants and therefore vulnerable to sampling bias.

Cluster Analysis

Cluster analysis is a promising method for making sense of topic sorting data from multiple participants. Cluster analysis is a method used to uncover the pattern or structure contained within proximity data (Everitt & Rabe-Hesketh, 1997, p1) and, from this, to produce a hierarchical structure. In the case of website design, the proximity data are the strengths of the similarity relationships between topics. Since very similar topics tend to be sorted together by many participants, and dissimilar topics are rarely if ever sorted together, a similarity measure for each pair of topics is calculated by summing the number of participants who sorted the pair in a common group.

WorkingWeb Information Architecture Tool

The WorkingWeb Information Architecture Tool combines a 'tester' interface through which users complete a topic sorting activity and a separate 'client' interface through which cluster analysis can be performed on the topic sorting data.

The method of cluster analysis used in the WorkingWeb information Architecture Tool combines a traditional centroid hierarchical clustering technique with an optimisation algorithm. Centroid hierarchical clustering is an iterative process whereby clusters of items are built up by combining two items or clusters at a time. On the first iteration the two most similar items (the pair with the highest similarity rating) are fused to form a cluster. Once a cluster is formed the positions of the cluster is represented by the mean values of its constituent items, mean vector, and the individual positions of constituent items is thenceforth ignored. The distance between clusters is defined as the distance between their mean vectors, and the distance between a cluster and an unclustered item is the distance between the item position and the cluster mean vector (Everitt, 1993). On the second iteration the next two most similar items are joined to form a second cluster and so on until a hierarchical structure (dendogram) is formed.

For practical purposes we are not interested in the complete hierarchy produced by the cluster analysis as the number of partitions is too great. With only two options at each node a large website would take hours to navigate so a subset of partitions must be selected. An appropriate number of clusters for the data is arrived at through an optimisation algorithm. The subject of optimisation is the ratio of within-category to between-category inter-item similarities, ensuring that similar items of information are categorised together while dissimilar items are categorised separately.

References

Bruza, P., McArthur, R., & Dennis, S. (2000). Interactive internet search: Keyword, directory and query reformulation mechanisms compared. Special Interest Group on Information Retrieval (SIGIR).

Bruza, P., & Dennis, S. (1997). Query reformulation on the Internet: Empirical data and the hyperindex search engine. Proceedings of the RIAO97 Conference - Computer Assisted Information Searching on the Internet, Centre de Hautes Etudes Internationales d'Informatique Documentaires, 488-499.

Dennis, S., Bruza, P., & McArthur, R. (2002). Web searching: A process oriented experimental study of three interactive search paradigms. Journal of the American Society for Information Sciences and Technology 53(2): 120-133.

Dennis, S., McArthur, R., & Bruza, P. D. (1998). Searching the World Wide Web made easy? The cognitive load imposed by query reformulation mechanisms. Proceedings of the Twentieth Annual Conference of the Cognitive Science Society, p 1214, Lawrence Erlbaum Associates.

Everitt, B. (1993). Cluster Analysis. New York: John Wiley & Sons Inc.

Everitt, B. and Rabe-Hesketh, s. (1997). The Analysis of Proximity Data. New York: John Wiley & Sons Inc.

Gullikson, S., Blades, R., Bragdon, M., McKibbon, S., Sparling, M. and Toms, E. (1999). The impact of information architecture on academic website usability. The Electronic Library. Vol. 17, No. 5

Kessner, M., Wood, J., Dillon, R. F., & West, R. L. (2001). On the reliability of usability testing. CHI 2001 Extended Abstracts, ACM Press, 97-98.

Kiger, J.I., (1984). The depth/breadth tradeoff in the design of menu-driven interfaces. International Journal of Man-Machine Studies, 20, 201-213.

Larson, K and Czerwinski (1998). Web page design: Implications of memory, structure and scent from information retrieval. Proceedings of the Association for Computing Machinery's Computer Human Interaction Conference, 18-23.

Mollich, R., Thomsen, A.D., Karyukina, B., Schmidt, L., Ede, M., Oel, W.V. and Arcuri, M. (1999), Comparative evaluation of usability tests, CHI 99 Extended Abstract, 83-84

Mollich, R., Bevan, N., Curson, I., Butler, S., Kindlund, E., Miller, D., Kirakowski, J. (1998), Comparative evaluation of usability tests, Proceedings of the Usability Professionals Association

Nielsen, J., and Sano, D. (1994). SunWeb: User interface design for Sun Microsystem's internal web. Proceedings of the 2nd World Wide Web Conference '94: Mosaic and the Web.

Rosenfeld, L.B. (2002). Information Architecture for the World Wide Web (2nd Edition) Cambridge: O'Reilly

Rosenfeld, L. & Morville, P. (1998). Information Architecture for the World Wide Web. Cambridge: O'Reilly

Rubin, J. (1994). Handbook of usability testing: how to plan, design and conduct effective tests. New York: John Wiley & Sons Inc.

Shannon, C. (1951) Predictions and Entropy of Printed English, Bell Systems Tech. Vol. 30, No. 50

Spool, J.M., Scanlon, T., Schroeder, W., Snyder, C. and DeAngelo, T. (1999). Web Site Usability: A Designer's Guide. San Francisco: Morgan Kauffman Publishers Inc.

Wurman, R.S. (1996). Information Architects. Zurich: Graphic Press

Hypertext References

HREF1
http://www.humanfactors.uq.edu.au/
HREF2
http://www.uq.edu.au/
HREF3
http://www.workingweb.com.au
HREF4
http://workingweb.com.au/services/UsingUsabilityTool.php
HREF5
http://workingweb.com.au/services/UsingInfoArchitectureTool.php

Copyright

Naomi Norman, © 2000. The authors assign to Southern Cross University and other educational and non-profit institutions a non-exclusive licence to use this document for personal use and in courses of instruction provided that the article is used in full and this copyright statement is reproduced. The authors also grant a non-exclusive licence to Southern Cross University to publish this document in full on the World Wide Web and on CD-ROM and in printed form with the conference papers and for the document to be published on mirrors on the World Wide Web.