The subject Operating Systems is a second year unit in the Bachelor of Computing Studies and in the Bachelor of Computer Engineering at the University of Canberra. Various presentation modes have been used in the past for this subject, such as whiteboards, overhead slides prepared manually, and overhead slides prepared using proprietary document processing systems such as Interleaf. Direct computer display mechanisms were not used for many years because of a lack of portable computers and a lack of suitable display systems in lectures.
In 1994, due to a study program overseas, I was given access to a portable computer on a regular basis. Computer display equipment was also installed in a few lecture halls. At this time I also came across the Web. Judging this the most promising system for allowing both myself and students access to the lecture material on-line, the lectures were converted to HTML format and placed on the Web.
The lecture notes, assignments, tutorial and laboratory exercises, and examination papers have been available on the Web since the second half of 1994. This paper reports on the patterns of access to these notes, and draws prelimary conclusions about the value of using the Web for courseware.
Many students at the University are part-time, and even back in 1994 some of them had access to the Internet via work. In the past the University had special systems in place (such as photocopies of lecture notes available in the library) to allow part-time students to juggle work and study requirements more easily. The Web seemed to allow a relatively cost-free way of augmenting and possibly replacing some of these.
The subject has been run in the second half of each year (late July to November) using a formal lecture/tutorial mechanism. I used the Web for lecture delivery. The students had 24-hour access to computer laboratories from which they could access the Web. Because of paper costs they were not allowed to print Web documents from our laboratories.
In addition to this category of local students, the Web also allows anyone to access the subject materials. I looked upon this as another potential benefit: as a teacher I am happy for my materials to benefit any students, no matter their location. However, this group of people have a separate set of requirements which may be impossible to determine. How well does a set of Web pages designed for one purpose satisfy others?
This paper attempts to separate the two groups and analyses usage for the two groups separately.
Lecture notes from the previous year may be purchased from the bookshop for the subject Operating Systems. The majority of students purchase these notes. Any Web accesses are in addition to the use made of these notes. No attempt was made to find how much the printed notes were used. Note that this use of the Web varies from some other reported courseware, where no printed notes were made available, and could not be produced. Patterns of access to such courseware would be very different!
External students do not have access to these bookshop notes. I have recently made my entire Web site accessible for anonymous ftp so that "batch" downloading and local use may be done. This does not affect results here very much since it is recent.
The NCSA server generates an access_log which
records requesting site and
the request, such as
hickory.canberra.edu.au - - [26/Jul/1994:12:45:44 +1000]
"GET /OS/l2_1.html HTTP/1.0" 200 8998
It is important to note what is, and what is not, included in this.
What is given is document accessed, date of access and the site
that requested it. All that is really clear is that an access was made
from a particular site at that time.
What is not included is the identity of the
user requesting access, how long they spent on the page or what they
did with it. Such information is not supplied by browsers
because of privacy, security reasons and simple deficiencies in the
HTTP protocol.
This leads to a number of problems that make interpretation of Web statistics a little hazardous [HREF4] [HREF5] [HREF6]:
In addition, one must be careful about which accesses are interpreted. A document with graphics will consist of multiple URLs, so what the user perceives of as a single access will be recorded as multiple accesses. Most of the image accesses should probably be discarded, especially those for navigational image buttons and graphical access count images. Note that we are adopting a user-centric view here: the Web administrator concerned with bandwidth issues would particularly want to know about image accesses.
Caching is employed at all levels of the Web nowadays: browsers have individual caches, organisations have caches on proxy servers, and so do larger geographic areas. The intention is to reduce network traffic, and the more they supply documents, the better they are. From the originating site, caching results in a reduction of accesses to the site with the better organised caching systems resulting in the fewest accesses. This has been suggested as the reason so many requesting sites have no symbolic name, only an internet `dotted' address: these are the sites that are too disorganised to even set up name servers, let alone use caching proxies! Server logs will undercount the number of user accesses.
My current access_log is 120Mb in size,
containing references to other
pages besides the Operating Systems notes. The file was "stream edited"
to leave only references to the relevant Web pages. References to GIF
images within documents were omitted, as well as accesses from my own
machine (I am the heaviest user of my own notes!).
In addition, a few error accesses such as asking for OS/html instead
of OS.html were eliminated.
This reduced the log
to 18Mb, which made processing faster.
As of March 1997, 101,496 accesses have been
made since the notes were first placed on the Web, with 27,718 of
these from local machines.
Browsers within the Faculty (where most local accesses took place) were configured to not cache documents from my server. While this could be overridden by users it is unlikely that many bothered to do so, since my server and the students machines are on the same ethernet with minimal delays. So the local figures for access counts are probably fairly accurate. Access counts from external sites will be reduced because of caching, but it is impossible to know how much the server log underestimates the number of accesses made by users.
Each lecture is structured as a single document, where text is interspersed with sample programs and diagrams. There are very few links from or into the body of these documents. This is largely due to the non-Web origin of the courseware, where a unit of material came in "lecture sizes". Each lecture is certainly not a "multimedia" document, rather a "linear" document that happens to be on the Web.
Had each lecture been designed as a set of Web pages then it would have been possible to track student accesses within each lecture. This would have given much more information about student use of each lecture. Patterns of usage for non-linear documents would be expected to be very different to linear ones.
The Home page is OS.html. Two lectures were given each week, labelled by week and lecture number as l1_1.html, l1_2.html, l2_1.html, l2_2.html, etc. While far from ideal (the names should be content based) this early labelling scheme has been stable enough to be adequate.
where the diamonds are local accesses and the squares are external accesses.
Groups of about one hundred and forty local students studied the course in each year from late July to November. A small group of about twenty local students studied the course from March to June in 1996. The local accesses are clearly skewed towards this semester teaching, showing that the students did indeed use the notes during the course.
Because the identity of students is not captured, it is not possible to give an unambiguous interpretation of these figures. Taking October 1995 as typical of semester use, one explanation is that each student accessed the notes once per day. More likely is that a subset of the students accessed them more frequently. This point is revisited later.
Compare the figures for July to November for local students in each year. In 1994, usage was low - the Web was very new then, and few students knew about it. Usage went up by a factor of ten in 1995. In 1996, it dropped by half. Was this caused by reduced interest in the novelty of the Web, or by the slower speed of loading Netscape in the Linux environment?
The external accesses have shown a steady increase over time. The leap in July, 1995 is probably caused by links being added from a well-known site such as the World Teaching Hall [HREF9]. There look to be definite lulls around holiday periods such as Christmas which would be expected, but this is probably not statistically significant. Accesses seem to have stabilised around 4,000 per month. The lecture material is not particularly time-dependant in interest, so either it would appear to have reached a steady-state access rate, or any increases are being hidden by caches elsewhere.
Ignoring accesses to the Home page (OS) for the moment, the most significant part would appear to be high number of accesses to the first three lectures, l1_1, l1_2 and l2_1 compared to most of the others. This would tend to indicate that about two-thirds of visitors decide not to explore further after these three pages.
One of the things that I wanted to examine was the amount of "browsing" versus in-depth study that takes place. The primary point of access is the Home page "OS". For each column this has about one third of the total accesses, indicating that on average two other pages are visited besides the Home page.
At first sight this is highly indicative of browsing - people accessing the Home page and then just one or two others - but this is probably just a reflection of the structure of the notes: there are links to and from the Home page for each lecture, but not between. To navigate from one lecture to another means going through the Home page. This structure would lead to an expectation that the Home page would get closer to half of the accesses. That it doesn't is probably due to browser caching and use of the Back button.
The dependence on document structure will, I think, be critical in interpreting such results. For example, if each lecture were linked to the next and previous ones, then I would expect far fewer accesses to the Home page. This could warrant further experimentation.
To get around this, it was assumed that multiple accesses in the one day from the same machine were likely to be from the same person. This would arise from a "session" where a user would be surfing through a series of pages. So a count of accesses from individual machines within a day would give a rough count of how many accesses each individual performed in a session. This is quite definitely an approximation. Caching will underestimate this figure. Multiple users from one site will overestimate it (given the relatively low number of accesses to my site compared to major commercial sites, I would not expect this to be a factor).
This count was only performed on external accesses. This was because it is harder to detect repeated use of the same machine by local students. A hand analysis of one day's local showed that while Faculty laboratory machines showed only one session per day, there were five separate sessions from the central information services machine. A finer definition of "session" is needed before local results can be properly analysed.
Local students already in possession of printed materials would not need to browse, searching for interesting material, because they would already have a good idea of the content. One would expect patterns of access to differ markedly between the local and external groups, but this cannot be tested yet.
The result is given in the attached Table Three [TREF3]:
Count of number of accesses per day per
external machine
Of about 19000 external sessions, nearly half only looked at the Home page (about 10 came directly to other places than the Home page). However, about one-quarter stayed for four or more accesses. This suggests that while a large amount of browsing is indeed going on, a significant number of visitors were directed enough and found enough of value to stay for a while. There are a total of 36 pages in the courseware, and in about 300 sessions most or all of these pages were accessed.
To see if this occurred, a count was made of the total accesses from each individual machine. It is a lot more difficult to be confident in what this is actually measuring. If user information was available then a count on this would give the long term patterns of use. Since it is not, we do the best we can. Again, only external accesses are used for this.
Two factors make the results inaccurate from what we really want to measure. The first is that students rarely have a dedicated machine of their own. So different sessions will probably be from different machines. This would tend to lower the per user access count. The second is that many different students could use the same machine, raising the access count. I feel this is less likely here since the accesses come from all over the world, but a single class running on X terminals from a single application server directed to my pages could overthrow this assumption (and caching could restore it!).
This table is given in Table Four [TREF4]: Count of total accesses per external machine
From some sites there were a large number of accesses. For example, there were between 100 and 400 visits from 40 sites. Either a number of people visited from them, or individuals returned to access the notes.
By and large this shows expected patterns: access during the week of the lecture. There are few accesses during the mid-semester break. However, nearly all accesses show a lull in the weeks following the break - does this confirm suspicions of a low point in student activities at this time?
What surprised me was the upsurge at the end of semester for all lectures. While not startling, it does appear that as exams approach student interest increases in all aspects of the subject ;-)
External access dropped off rapidly after looking at the first few lectures. These are lectures that deal with assessment procedures, local environment, etc. A pointer to skip past this may improve this by moving external students to more relevant material.
A better way of improving external retention rates might be to offer credit in the course. This opens up the problems of external accreditation, fees, etc.
For local students, a major change in 1995 was to allow students to run programs from within the lectures. The course had a major emphasis on programming exercises, and a mechanism was put in place to allow students to run and change example programs without having to worry about the details of compilers and editors. Such details would be needed in independant work, but not in pedagogical material.
The mechanism was to place programs in Forms submitted by CGI scripts. Security reasons dictated a particular solution using proxy servers running on each machine. While many students were observed to be using this mechanism, local servers with unrecorded accesses missed the potential to gather information in 1995. Recording is in place for 1996 for each Faculty machine. While messy, it should be possible to relate these local proxy accesses to my machine accesses, to give a better idea of what students do with lectures, and as to whether or not this particular enhancement is used.
In student surveys, there has been a consistent response: the students in this computing course are very glad to see a lecturer using the type of technology they are learning about; they have also expressed value in the notes being available from the student laboratories. The reaction has been positive to doing this.
The tables in this paper also show that the students do indeed make use of the notes. Precisely what they do with them online is not known, but they do spend the time to get at them.
A number of email messages from external users have said that they have found the courseware useful.
Do the assessment results show any change? Not anything that could be labelled as unambiguously due to the Web. The presentation and access methods may help, but still ultimately the student has to learn. The presentation was not so radically different as to necessarily result in a change in student learning.
The structure of a courseware document can affect the patterns of access and how the data should be interpreted. The particular structure of this courseware lead to particular patterns of access. Even minor changes of structure would be expected to lead to different access patterns.
This study was conducted under certain conditions:
The Web courseware was available in printed form to the local students. Thus the Web accesses were in addition to their use of printed versions. It may be useful to compare accesses to other similar courseware that does not make printed copies available.
For internal accesses, the principal conclusions are
For external accesses, the principal conclusions are
Jan Newmarch ©, 1997. The authors assigns to Southern Cross University and other educational and non-profit institutions a non-exclusive licence to use this document for personal use and in courses of instruction provided that the article is used in full and this copyright statement is reproduced. The authors also grants a non-exclusive licence to Southern Cross University to publish this document in full on the World Wide Web and on CD-ROM and in printed form with the conference papers, and for the document to be published on mirrors on the World Wide Web. Any other usage is prohibited without the express permission of the authors.
[Presentation] [All Papers and Posters]