Courseware on the Web: An Analysis of Student Use


Jan Newmarch Faculty of Information Science and Engineering, University of Canberra, PO Box 1, Belconnen, ACT 2616, Australia Email:jan@ise.canberra.edu.au. Home Page: http://pandonia.canberra.edu.au/

Keywords

World Wide Web, Education, Courseware, HTTP logfile analysis, Student behaviour

Abstract

There has been a massive rush over the last few years to place courseware on the Web, on the assumption that this will improve student learning. Little work has been performed on testing the value of this assumption over an extended period - largely because the Web is so young. This paper reports on use of some courseware over a three year period, to give a longitudinal analysis of the use and possible value of courseware, both from a local and external student viewpoint. The paper is based on an analysis of server logs over a three year period and attempts to draw conclusions despite the poor quality of the log data.

Introduction

There has been a massive rush over the last few years to place courseware on the Web, on the assumption that this will improve student learning. A number of projects have reported work and student reaction over a short period, such as [HREF1] [HREF2] [HREF3]. However, very little work has been performed on testing the value of this assumption over an extended period - largely because the Web is so young. This paper reports on use of some courseware over a three year period, to give a longitudinal analysis of the use and possible value of courseware, both from a local and external student viewpoint. The paper is based on an analysis of server logs over a three year period and attempts to draw conclusions despite the poor quality of the log data.

The subject Operating Systems is a second year unit in the Bachelor of Computing Studies and in the Bachelor of Computer Engineering at the University of Canberra. Various presentation modes have been used in the past for this subject, such as whiteboards, overhead slides prepared manually, and overhead slides prepared using proprietary document processing systems such as Interleaf. Direct computer display mechanisms were not used for many years because of a lack of portable computers and a lack of suitable display systems in lectures.

In 1994, due to a study program overseas, I was given access to a portable computer on a regular basis. Computer display equipment was also installed in a few lecture halls. At this time I also came across the Web. Judging this the most promising system for allowing both myself and students access to the lecture material on-line, the lectures were converted to HTML format and placed on the Web.

The lecture notes, assignments, tutorial and laboratory exercises, and examination papers have been available on the Web since the second half of 1994. This paper reports on the patterns of access to these notes, and draws prelimary conclusions about the value of using the Web for courseware.

The two audiences

The subject is designed for internal access by students enrolled in courses at the University of Canberra. The Web allows access from any location, and I considered this a favourable aspect for my purposes as we have a mixture of Unix workstations, PCs and Macs located over the campus.

Many students at the University are part-time, and even back in 1994 some of them had access to the Internet via work. In the past the University had special systems in place (such as photocopies of lecture notes available in the library) to allow part-time students to juggle work and study requirements more easily. The Web seemed to allow a relatively cost-free way of augmenting and possibly replacing some of these.

The subject has been run in the second half of each year (late July to November) using a formal lecture/tutorial mechanism. I used the Web for lecture delivery. The students had 24-hour access to computer laboratories from which they could access the Web. Because of paper costs they were not allowed to print Web documents from our laboratories.

In addition to this category of local students, the Web also allows anyone to access the subject materials. I looked upon this as another potential benefit: as a teacher I am happy for my materials to benefit any students, no matter their location. However, this group of people have a separate set of requirements which may be impossible to determine. How well does a set of Web pages designed for one purpose satisfy others?

This paper attempts to separate the two groups and analyses usage for the two groups separately.

Availability of materials

A practice has grown up within the Faculty of Information Sciences and Engineering to make copies of the previous year's lecture notes available as booklets sold at the beginning of each semester. This is done to allow students to listen to the lecture and to annotate notes rather than be forced to copy frantically what the lecturer scribbles. This practice evolved from the need to supply part-time students with courseware materials when work exigencies prevented them from attending lectures. Full-time students detected this mechanism and exploited it themselves, leading to the current system.

Lecture notes from the previous year may be purchased from the bookshop for the subject Operating Systems. The majority of students purchase these notes. Any Web accesses are in addition to the use made of these notes. No attempt was made to find how much the printed notes were used. Note that this use of the Web varies from some other reported courseware, where no printed notes were made available, and could not be produced. Patterns of access to such courseware would be very different!

External students do not have access to these bookshop notes. I have recently made my entire Web site accessible for anonymous ftp so that "batch" downloading and local use may be done. This does not affect results here very much since it is recent.

Logging accesses

Since I was the first person in our University to use the Web for any serious purpose and because I have a Unix machine as preferred system, I set up an httpd server on my machine. The server is the NCSA server 1.3, because I use NCSA server side includes on some documents. All my courseware and other materials are available through this server.

The NCSA server generates an access_log which records requesting site and the request, such as

hickory.canberra.edu.au - - [26/Jul/1994:12:45:44 +1000] 
                   "GET /OS/l2_1.html HTTP/1.0" 200 8998
It is important to note what is, and what is not, included in this. What is given is document accessed, date of access and the site that requested it. All that is really clear is that an access was made from a particular site at that time. What is not included is the identity of the user requesting access, how long they spent on the page or what they did with it. Such information is not supplied by browsers because of privacy, security reasons and simple deficiencies in the HTTP protocol.

This leads to a number of problems that make interpretation of Web statistics a little hazardous [HREF4] [HREF5] [HREF6]:

Thus potentially interesting questions such as "how many different students accessed the notes?", "how many times did a particular student access the notes?" or "for how long did a student read a page?" cannot be answered.

In addition, one must be careful about which accesses are interpreted. A document with graphics will consist of multiple URLs, so what the user perceives of as a single access will be recorded as multiple accesses. Most of the image accesses should probably be discarded, especially those for navigational image buttons and graphical access count images. Note that we are adopting a user-centric view here: the Web administrator concerned with bandwidth issues would particularly want to know about image accesses.

Caching is employed at all levels of the Web nowadays: browsers have individual caches, organisations have caches on proxy servers, and so do larger geographic areas. The intention is to reduce network traffic, and the more they supply documents, the better they are. From the originating site, caching results in a reduction of accesses to the site with the better organised caching systems resulting in the fewest accesses. This has been suggested as the reason so many requesting sites have no symbolic name, only an internet `dotted' address: these are the sites that are too disorganised to even set up name servers, let alone use caching proxies! Server logs will undercount the number of user accesses.

My current access_log is 120Mb in size, containing references to other pages besides the Operating Systems notes. The file was "stream edited" to leave only references to the relevant Web pages. References to GIF images within documents were omitted, as well as accesses from my own machine (I am the heaviest user of my own notes!). In addition, a few error accesses such as asking for OS/html instead of OS.html were eliminated. This reduced the log to 18Mb, which made processing faster. As of March 1997, 101,496 accesses have been made since the notes were first placed on the Web, with 27,718 of these from local machines.

Browsers within the Faculty (where most local accesses took place) were configured to not cache documents from my server. While this could be overridden by users it is unlikely that many bothered to do so, since my server and the students machines are on the same ethernet with minimal delays. So the local figures for access counts are probably fairly accurate. Access counts from external sites will be reduced because of caching, but it is impossible to know how much the server log underestimates the number of accesses made by users.

Structure of the courseware

There is a primary Home page that has a table of contents for the individual lectures. There are pointers to the assignments. Related information such as previous examinations and other courseware is included. This home page for the courseware is the central access point.

Each lecture is structured as a single document, where text is interspersed with sample programs and diagrams. There are very few links from or into the body of these documents. This is largely due to the non-Web origin of the courseware, where a unit of material came in "lecture sizes". Each lecture is certainly not a "multimedia" document, rather a "linear" document that happens to be on the Web.

Had each lecture been designed as a set of Web pages then it would have been possible to track student accesses within each lecture. This would have given much more information about student use of each lecture. Patterns of usage for non-linear documents would be expected to be very different to linear ones.

The Home page is OS.html. Two lectures were given each week, labelled by week and lecture number as l1_1.html, l1_2.html, l2_1.html, l2_2.html, etc. While far from ideal (the names should be content based) this early labelling scheme has been stable enough to be adequate.

Analysis software

There are a number of server log analysis tools available [HREF7] [HREF8]. Some of those could be used to generate some of the following reports. Some of the reports though are too specialised. A set of Perl scripts were written to perform the analysis.

Total accesses by month

The first figure measured is the total accesses by month. This is just to give a rough idea of the frequency of accesses, without showing any details. This is shown in Table One [TREF1]: Total accesses by month
where the diamonds are local accesses and the squares are external accesses.

Groups of about one hundred and forty local students studied the course in each year from late July to November. A small group of about twenty local students studied the course from March to June in 1996. The local accesses are clearly skewed towards this semester teaching, showing that the students did indeed use the notes during the course.

Because the identity of students is not captured, it is not possible to give an unambiguous interpretation of these figures. Taking October 1995 as typical of semester use, one explanation is that each student accessed the notes once per day. More likely is that a subset of the students accessed them more frequently. This point is revisited later.

Compare the figures for July to November for local students in each year. In 1994, usage was low - the Web was very new then, and few students knew about it. Usage went up by a factor of ten in 1995. In 1996, it dropped by half. Was this caused by reduced interest in the novelty of the Web, or by the slower speed of loading Netscape in the Linux environment?

The external accesses have shown a steady increase over time. The leap in July, 1995 is probably caused by links being added from a well-known site such as the World Teaching Hall [HREF9]. There look to be definite lulls around holiday periods such as Christmas which would be expected, but this is probably not statistically significant. Accesses seem to have stabilised around 4,000 per month. The lecture material is not particularly time-dependant in interest, so either it would appear to have reached a steady-state access rate, or any increases are being hidden by caches elsewhere.

Page accesses by year

The next figure measured from the log is the total accesses to each page. This is divided into three categories: total, the external and the local (i.e. from a University machine) accesses. This is given in Table Two [TREF2]: Total accesses to each page by year

Ignoring accesses to the Home page (OS) for the moment, the most significant part would appear to be high number of accesses to the first three lectures, l1_1, l1_2 and l2_1 compared to most of the others. This would tend to indicate that about two-thirds of visitors decide not to explore further after these three pages.

One of the things that I wanted to examine was the amount of "browsing" versus in-depth study that takes place. The primary point of access is the Home page "OS". For each column this has about one third of the total accesses, indicating that on average two other pages are visited besides the Home page.

At first sight this is highly indicative of browsing - people accessing the Home page and then just one or two others - but this is probably just a reflection of the structure of the notes: there are links to and from the Home page for each lecture, but not between. To navigate from one lecture to another means going through the Home page. This structure would lead to an expectation that the Home page would get closer to half of the accesses. That it doesn't is probably due to browser caching and use of the Back button.

The dependence on document structure will, I think, be critical in interpreting such results. For example, if each lecture were linked to the next and previous ones, then I would expect far fewer accesses to the Home page. This could warrant further experimentation.

Accesses per session

The raw access figures per page do not give many clues about the amount of browsing. Since the log does not contain the identity of each user it is impossible to get an exact count of how many times each person visited.

To get around this, it was assumed that multiple accesses in the one day from the same machine were likely to be from the same person. This would arise from a "session" where a user would be surfing through a series of pages. So a count of accesses from individual machines within a day would give a rough count of how many accesses each individual performed in a session. This is quite definitely an approximation. Caching will underestimate this figure. Multiple users from one site will overestimate it (given the relatively low number of accesses to my site compared to major commercial sites, I would not expect this to be a factor).

This count was only performed on external accesses. This was because it is harder to detect repeated use of the same machine by local students. A hand analysis of one day's local showed that while Faculty laboratory machines showed only one session per day, there were five separate sessions from the central information services machine. A finer definition of "session" is needed before local results can be properly analysed.

Local students already in possession of printed materials would not need to browse, searching for interesting material, because they would already have a good idea of the content. One would expect patterns of access to differ markedly between the local and external groups, but this cannot be tested yet.

The result is given in the attached Table Three [TREF3]: Count of number of accesses per day per external machine

Of about 19000 external sessions, nearly half only looked at the Home page (about 10 came directly to other places than the Home page). However, about one-quarter stayed for four or more accesses. This suggests that while a large amount of browsing is indeed going on, a significant number of visitors were directed enough and found enough of value to stay for a while. There are a total of 36 pages in the courseware, and in about 300 sessions most or all of these pages were accessed.

Total external accesses by machine

The last section reported on the amount of accesses within a single session by a user. Users who find a site to be of value often return many times to that site, to get extra material and to see if any changes have been made (I know I do this).

To see if this occurred, a count was made of the total accesses from each individual machine. It is a lot more difficult to be confident in what this is actually measuring. If user information was available then a count on this would give the long term patterns of use. Since it is not, we do the best we can. Again, only external accesses are used for this.

Two factors make the results inaccurate from what we really want to measure. The first is that students rarely have a dedicated machine of their own. So different sessions will probably be from different machines. This would tend to lower the per user access count. The second is that many different students could use the same machine, raising the access count. I feel this is less likely here since the accesses come from all over the world, but a single class running on X terminals from a single application server directed to my pages could overthrow this assumption (and caching could restore it!).

This table is given in Table Four [TREF4]: Count of total accesses per external machine

From some sites there were a large number of accesses. For example, there were between 100 and 400 visits from 40 sites. Either a number of people visited from them, or individuals returned to access the notes.

Local access by semester measures

Lectures are delivered to students over a period of time. One would expect the local student's focus of interest to follow the lecture timetable. Based on semester measures of the week of semester, we can track the local accesses to each lecture. Weeks 1-15 were lecture weeks with a mid-semester break in weeks 9-10. Examinations were held in the weeks following week 15. For 1994, 1995 and 1996 these are given in Table 5 [TREF5]: Weekly accesses to lectures

By and large this shows expected patterns: access during the week of the lecture. There are few accesses during the mid-semester break. However, nearly all accesses show a lull in the weeks following the break - does this confirm suspicions of a low point in student activities at this time?

What surprised me was the upsurge at the end of semester for all lectures. While not startling, it does appear that as exams approach student interest increases in all aspects of the subject ;-)

External accesses by local students

Some students can get at the lectures from their work connections or private connections to the Internet. In fact, one of the earliest accesses was from a student working at ERIN. Without a good knowledge of the nearby sites it is impossible to determine this kind of access. Some simple figures are: there were 34,761 Australian accesses, of which 27,636 were internal. This means that at a maximum, no more than 7,125 were from external accesses by local students. A better estimate may be given by the gov.au accesses, since we are in Canberra: this totalled 1372.

Improving the courseware

Is it possible to improve the courseware based on these figures? I would argue that the figures can provide only a crude guidance, since it is impossible to tell what users actually do with pages once they have them in their browser.

External access dropped off rapidly after looking at the first few lectures. These are lectures that deal with assessment procedures, local environment, etc. A pointer to skip past this may improve this by moving external students to more relevant material.

A better way of improving external retention rates might be to offer credit in the course. This opens up the problems of external accreditation, fees, etc.

For local students, a major change in 1995 was to allow students to run programs from within the lectures. The course had a major emphasis on programming exercises, and a mechanism was put in place to allow students to run and change example programs without having to worry about the details of compilers and editors. Such details would be needed in independant work, but not in pedagogical material.

The mechanism was to place programs in Forms submitted by CGI scripts. Security reasons dictated a particular solution using proxy servers running on each machine. While many students were observed to be using this mechanism, local servers with unrecorded accesses missed the potential to gather information in 1995. Recording is in place for 1996 for each Faculty machine. While messy, it should be possible to relate these local proxy accesses to my machine accesses, to give a better idea of what students do with lectures, and as to whether or not this particular enhancement is used.

Does the Web improve learning?

The Web allows a relatively simple way to give an electronic presentation of course material. It also allows students to use the notes at any time. Does this improve learning? It is hard to say.

In student surveys, there has been a consistent response: the students in this computing course are very glad to see a lecturer using the type of technology they are learning about; they have also expressed value in the notes being available from the student laboratories. The reaction has been positive to doing this.

The tables in this paper also show that the students do indeed make use of the notes. Precisely what they do with them online is not known, but they do spend the time to get at them.

A number of email messages from external users have said that they have found the courseware useful.

Do the assessment results show any change? Not anything that could be labelled as unambiguously due to the Web. The presentation and access methods may help, but still ultimately the student has to learn. The presentation was not so radically different as to necessarily result in a change in student learning.

Conclusion

This type of study is unique to the Web versus other courseware materials: patterns of access to printed courseware are imposssible to obtain except under controlled conditions, and most CD-Rom software does not record access patterns. The closest most systems come to recording information is by setting tests at the end of certain sections.

The structure of a courseware document can affect the patterns of access and how the data should be interpreted. The particular structure of this courseware lead to particular patterns of access. Even minor changes of structure would be expected to lead to different access patterns.

This study was conducted under certain conditions:

The Web courseware was available in printed form to the local students. Thus the Web accesses were in addition to their use of printed versions. It may be useful to compare accesses to other similar courseware that does not make printed copies available.

For internal accesses, the principal conclusions are

For external accesses, the principal conclusions are

Hypertext References

HREF1
http://www.scu.edu.au/sponsored/ausweb96/educn/boalch/ - Gregg Boalch, "WWW as an Educational Support Medium - An Australian Case Study"
HREF2
http://www.scu.edu.au/sponsored/ausweb96/educn/jones/ David Jones, "Solving some Problems of University Education - a Case Study"
HREF3
http://www.scu.edu.au/sponsored/ausweb96/educn/wise/ Lisa Wise and Chris Hughes, "Integrating WWW into an On-Campus Laboratory-based Teaching Program"
HREF4
http://www.piperinfo.com/pl01/usage.html - Dana Noonan, "Making Sense of Web Usage Statistics"
HREF5
http://gopher.nara.gov:70/0h/what/stats/webanal.html - Doug Linder, "Interpreting WWW Statistics"
HREF6
http://www.cranfield.ac.uk/stats/ - Jeff Goldberg, "Why Web Usage Statistics are (worse than) Meaningless"
HREF7
http://www.statslab.cam.ac.uk/~sret1/analog/ - Analog Web analysis tool
HREF8
http://www.ics.uci.edu/pub/websoft/wwwstat/ - HTTP Logfile Analysis Software
HREF9
http://www.utexas.edu/world/lecture/ - World Lecture Hall

Table References

The tables are too bulky to be included in the body of the text, so they are referenced from this document
TREF1
http:monthly.html - Total accesses by month
TREF2
page_yearly.html -Total accesses to each page by year
TREF3
accesses_daily.html - Count of number of accesses per day per external machine
TREF4
accesses_total_sites.html - Count of total accesses per external machine
TREF5
lecture_weekly.html - Weekly accesses to lectures

Copyright

Jan Newmarch ©, 1997. The authors assigns to Southern Cross University and other educational and non-profit institutions a non-exclusive licence to use this document for personal use and in courses of instruction provided that the article is used in full and this copyright statement is reproduced. The authors also grants a non-exclusive licence to Southern Cross University to publish this document in full on the World Wide Web and on CD-ROM and in printed form with the conference papers, and for the document to be published on mirrors on the World Wide Web. Any other usage is prohibited without the express permission of the authors.


[Presentation]  [All Papers and Posters]


AusWeb97 Third Australian World Wide Web Conference, 5-9 July 1997, Southern Cross University, PO Box 157, Lismore NSW 2480, Australia Email: AusWeb97@scu.edu.au