Personalisation issues in the enterprise learning portal.

Ron Sawyer [HREF1], Principle Project Manager, Flexible Learning and Teaching Program, Information Technology Services [HREF2] , PO Box 3A, Monash University [HREF3], Victoria, 3800. Ron.Sawyer@its.monash.edu

Nathan Bailey [HREF4], Manager, Development+Integration, Flexible Learning and Teaching Program, Information Technology Services [HREF2] , PO Box 3A, Monash University [HREF3], Victoria, 3800. Nathan.Bailey@its.monash.edu

Abstract

With hundreds of millions of online resources of many types available over the web, the average web user wastes valuable time and energy sorting through irrelevant references and refine search vocabularies in order to find what is relevant to them. In many cases, users don't know that there are resources that are relevant to them and don't even bother to look.

At Monash, the portal team is developing an Enterprise Learning Portal which through profile based personalisation can push references to potentially relevant resources into the portal desktops of all individuals of the University community. The system is based on a set of profile attributes for each community member. The same profile attributes are used in the resource catalogue to define specific, tight target audiences for each resource. A relevance engine matches the resources to each member's profile attributes. The Enterprise Learning Portal learns profile attribute values for individuals and resources and refines the rules of relevance appropriately.

Overview

This paper outlines a range of issues related to personalisation of an Enterprise Portal. Specifically:

Scope

This paper relates to Enterprise Portals which attempt to address the needs of members of a community. The Enterprise Portal provides fast and easy access to all the online resources that each individual within the community needs in their role(s) within the community.

Many other types portals exist and address different issues (eg. search engines, applications and shopping malls).

Throughout this we will draw heavily from the experience and planning of the my.monash portal in service at Monash University since July 1999.

The value proposition of an Enterprise Portal

The vision of an Enterprise Portal is to provide members of the enterprise or community with a single starting point that can take them to everything the need with a couple of mouse clicks. Community members can do this from any Internet-connected device in the world, at a time that is convenient for them.

The Enterprise Portal is expected to:

In short the Enterprise Portal is expected to be a globally accessible personalised desktop that understands your needs and adapts to them.

People access the web through a portal for one reason alone -- to access online resources and services. The Enterprise Portal value proposition is simply to provide better access to this content.

Individually personalised content management is the fundamental upon which the portal must function. Other issues are may be showstoppers if not done adequately, or nice features to have, but the content is what the users need.

Customisation and personalisation

Portals allow users to tailor their own content and presentation with all sorts of bells and whistles. Being able to do this is important and a fundamental feature of any good portal. Strauss [ HREF5 ] outlines many of the sort of things that users must be able to do with an emphasis of being able to find items of interest and add them to their portals. Early portals have placed a lot of emphasis on people being able to change the look and feel of their portal but recent evidence such as Manber, Patel and Robison [ HREF6 ] suggest that the vast majority of users don't customise their portals at all.

At Monash we have observed the same thing with the customisation facilities only being used by a minority and these are generally the power users. Average users like the portal because of what it brings together for them by itself and use traditional means (including not at all) of finding and tracking of other online resources.

Profile-oriented Personalisation (PoP) aims to eliminate the need for most individuals to customise their portals in order to access the resources of most relevance to them. And when they find their portals are lacking, they check their profile to make sure it is up to date. Adjusting the profile should permit the portal's relevance engine to find the missing resources.

Content management

The Concise Oxford Dictionary [3] defines a portal as:
a doorway or gate etc., especially a large and elaborate one.
Based on this definition, an Enterprise Portal should focus on facilitating the delivery of information rather than providing information and services. From the user's point of view, the Enterprise Portal provides an integrated environment with all the required services and systems in a unified interface. But the portal is acting as a proxy, it is the focal point from where everything else is managed.

The model for an Enterprise Portal described in this paper has no end user accessible content of its own. Instead, the focus is on meta-managing the data, describing about resources, access methods and potentially users although user metadata may be better stored in a directory service (eg. LDAP [ HREF7 ]).

There is a danger in providing an Enterprise Portal service to attempt to be "all things to all people", with the risk that it won't be anything to anyone, delivering a mediocre bit of everything rather than excellence in one thing -- a "one stop information shop" that allows seamless management of multiple and diverse systems.

The Information Architecture section later in this paper discusses a view of the online resources needed by enterprise community members, and how personalised views of the resources can be managed.

Portal internal content

Some portals store a lot of information within them. Some of these are portals that have been specifically developed to front-end applications or to imbed applications. Examples of these are mySAP.com [ HREF8 ]and the Blackboard [ HREF9 ] Level 3 offering.

When developing a portal it is tempting to host new resources that are not yet online. As described above, this temptation needs to be avoided to ensure independence from any one system or suite of systems. Although new resources may physically reside on the same hardware they need to be sufficiently separate so that they can easily be moved to a different system.

Syndicated feeds

Content can be syndicated from other sources, with a content provider publishing the same content to many 'portals' for distribution to their respective community members. These are generally news types of feeds and are modelled on the lines of the traditional newswire services.

This is expected to be a growing area with more sources becoming available providing "up to the minute" information in specific areas.

Within my.monash a variety of these are used with a preference for the RDF format [ HREF10 ]. Monash also incorporates summaries from standard mailing list archives (MHonArc [ HREF11 ] and MailMan [ HREF12 ]) and screen scrapes some published material and republishes it as resources for inclusion in the portal.

The Information Architecture

It is important for organisations to have a holistic view of their information flows, needs and storage, indexing and retrieval systems. Such an "Information Architecture" ensures efficiencies of scale and management through a "big picture" perspective, in a similar fashion to architectural documents for large buildings.

Figure 1 provides a view of an Information Architecture used to show the information classes and their interrelationships. It also shows how an enterprise portal can provide an overarching access mechanism for the collation and dissemination of information, resources and services.

Figure 1 - Information Architecture
[Diagram of an enterprise information architecture]

Web-based information

The Webspace (Web based material used by members of the enterprise community in their roles within the organisation) covers a vast range of information. It can be broken into three categories: At Monash the internal web sites are hosted on over 550 web servers across the campuses, faculties, departments and divisions containing more than 300,000 documents mainly in HTML but also other formats, particularly PDF, MS Word and Powerpoint. There is a growing demand to use multimedia in teaching, research and support services with animations, simulations, and audio and video streaming starting to become common. The majority of documents contain intellectual property created and owned by Monash and many have access restricted to particular groups of the Monash community.

Monash has many partner organisations with web content of significant value to Monash. These include:

Some of this content may also contain information that is restricted to the Monash community, or parts of it.

There is also a lot of material which is hosted outside Monash and partners related to Monash activities to which staff and students need access, including:

Search tools provide a fallback mechanism for members of the community trying to find documents that the portal has not identified as being relevant to them. The standard content/author focused techniques are used with indices relating to internal material being updated daily.

Resources managed by applications

Much of the Monash's official information is stored in relational databases. Commercial or Monash developed applications manage the information based on business rules. Examples of this class of information include: There is also material managed by applications, stored in a variety of other ways including a variety of plain text formats. These have been collectively referred to as non-traditional databases (NTDBMS). Many of these applications have been developed within Monash to address specific business needs. Examples of this type of information include:

Online services and publishing agents

Information managed by applications can be extracted, processed and published as static web pages by publishing agents.

Some applications provide a Web accessible online presence but others require additional modules to provide this service. Monash has a vision that all business services provided by these applications will be accessible online and is investing significant resources in achieving this through the acquisition and development of web-enabled capability. The resulting online services each have their own URL and appropriate security.

Online services also provide means for community members to enter or modify information such as enrolling in subjects, changing residential addresses, sending email and submitting documents to discussion forums or repositories.

Components of the portal

The my.monash portal is an enterprise portal with a vision of providing every member of the Monash community with a private and personal entry point to all of the online resources that they need as part of the Monash community.

Establishing user profiles

A set of profile attributes, which is used to identify individual needs, is established for each member of the community. These attributes include: The profile attributes are drawn from existing information where possible and stored in a secure directory service.

Resource catalogue and management

Metadata about resources is stored in the resource catalogue, specifically, to which profile attributes or combination of profile attributes the resource relates.

Three classes of access can be defined:

The resource catalogue may simply provide a link to one of the existing web pages or online services, or it may point to an intelligent interfacing agent, which enriches the service such as: Multiple resources may originate from the same site or system by deep linking to different parts. The resource may also move. For example, during busy periods, exam results are drawn from a data warehouse facility rather than the live student system used for low transaction level periods.

The relevance engine

The relevance engine is the system which uses rules associated with resources to match the profile attributes for each individual in the directory.

The rules may also contain timing metadata relating to when a resource is valid. For example, the reenrolment service is only valid for part of the year. Rules may also be associated with different presentation agents.

Presentation agents

By separating the presentation function into a separate presentation agent, alternate presentation agents can easily be created to provide all of the services which are suitable via different access methods.

Currently only the web is supported but the architecture permits presentation agents for alternate methods of access such as WAP on mobile phones, IVR over the phone and hand held devices such as palm pilots.

Presentation agents for non-web oriented mechanisms would provide access methods to the same resources with the appropriate form of presentation. Resources based on streaming video or PDF files may be considered unsuitable over WAP but streaming audio may integrate well into IVR.

Teaching the portal to learn

The volume of material needed by the enterprise portal in user profiles and the resource catalogue dictates that automated means are required to ensure that the data is up to date and accurate. A variety of strategies are being adopted at Monash to achieve this.

Knowing who the users are

Some system somewhere needs to be the authoritative source of who is a member of the community. There may be several systems involved and their data must be combined.

At Monash, the HR system is used for staff and the student system for students. The data is combined into an LDAP directory. As the portal expands to include the broader community such as alumni and prospective students, new means are required to be the definitive sources of data.

Some portal systems permit and encourage people to update their records in the portal or directory. Where a definitive source for records is available, it is critical that the changes are either propagated to the definitive source is updated rather than a local copy.

Learning about new resources

Webcrawlers provide a good start to finding new resources and this is quite easy to implement. The difficulty then is determining what to do with the resources when they are discovered.

Making value judgements on content

Some search engines suppliers put a lot of effort into value judging the sites they reference.

Some search engines such as Looksmart and Yahoo have people looking at the site to determine its potential value to target audiences. The problem here is there is only scope for a single value judgement from a general perspective. Value judgements for use in an enterprise portal must be made from a multitude of perspectives once the baseline of some value to somebody has been achieved.

Search engines such as Google perform an automated value judgement based on the number and prominence of sites that link to a page. This saves a lot of the leg work but often the most valuable of material is the most recent and its value declines with age. It takes time for other web page authors to find and link to good material by which time the value may be waning.

Determining target audience for content

The key issue for pushing resources to people is based on determining specific, tight target audiences for those resources.

The mechanism proposed at Monash is to define and publish a profile attribute vocabulary which is referenced in target audience metadata of each resource to be catalogued by the resource catalogue webcrawler.

Other strategies include:

Enhancing relevance

The Learning Enterprise Portal must explain the rules used to determine why it pushed or did not push a particular resource to a particular user when the user asks for the justification. This justification will be based on the users profile attributes, the resource attributes and the rules. The user may suggest that one or more of these elements is inaccurate and suggest a refinement.

User-driven relevance refinement

This form of adjustment is similar to the Yahoo! and Google models, where human input and/or peer review influence the worth of rating recommendations.

When users suggest refinements in profile attributes and relevance rules, a value judgement must be made on believing the user. Applying the suggestion to this user's portal is fine but the in the context of propagating the refinement to all other users needs further evaluation.

Stage one will require manual intervention, where someone checks the suggestion to ensure that the benefits are likely to outweigh any detriment and filter out any practical jokes. However, this is not sustainable in the long term.

Stage two provides Slashdot-style "karma" [ HREF13 ] [ HREF14 ] [ HREF15 ] (credibility ratings) for each community member which determines if the suggestion requires manual validation before being propagated to the rest of the community.

Stage three permits limited propagation to a section of the community close to the user who made the suggestion and seeks peer review prior to full propagation.

User/usage-influenced relevance refinement

A less explicit form of user-driven refinement occurs as the portal learns about a users behaviour and frequent activities. If Melanie always begins her emails with "Dear <name>", then the portal should include this text by default when she composes and replies to messages.

If Jason always tends to open new browser windows for links then his portal should add the appropriate HTML to launch a new window automatically.

If Alla tends to read world news articles before reading articles about sport, then her local news component should filter the world news upwards and the sports news downwards.

If most first-year Arts view a particular web site (as précised from aggregate proxy data), then this resource should become a default for Arts students.

If I regularly send mail to a couple of friends, then their addresses should be available in a drop list when I compose a new message

This style of refinement is very similar to the MS Office "AutoCorrect" functionality, which does spelling, capitalisation and layout corrections on the fly. If Sam always misspells 'privilege' then the portal could automatically fix this as he sends his message. A list of such corrections can be built up over time, and modified by the user to refine the almost correct or remove the incorrect.

Association-driven relevance refinement

This form of adjustment is more like the Amazon.com model, where the relevance engine examines past behaviour and customises presentation based on decisions made by classes of similar users.

Association-driven relevance allows the portal to automatically learn better associations for you, based on what other people do. People who are members of the darts club are also members of the soccer club -- so when we give a list of clubs for you to join, we highlight the soccer club as especially interesting for you.

People who are postgraduates are most likely interested in the postgraduate research centre and other postgraduate resources, so when they go to the "My.community" channel, they get a postgraduate-specific theme which caters to their specialist needs.

People who are recently new to the University (eg. first year students) probably need a bit of extra help, so the portal has a bit more descriptive information around each component for them, and includes information about orientation (staff or student view) and transition (for recent school leavers).

People who chose these three resources generally also choose that one, so next time we get someone who chooses those three, we'll recommend the forth. Similarly, people who chose those three resources are generally second-year chemical engineers, so lets make them default resources on the home page of second-year chemical engineers (automatically).

People who use the ''My community'' channel frequently also tend to use email and chat frequently, so let's put email and chat on to the My community channel by default (automatically), so it's all in one place.

People who are in South Africa tend not to use the portal during Australian mornings, so lets increase the frequency of updates for their information at other times, but decrease it during the Aussie morning.

People who've been at Monash for many years don't tend to read the Monash corporate information (ie. where our campuses and centres are, and maps to get to them), so let's remove it from their defaults.

Time-driven relevance refinement

Work tends to be seasonal (ie. adjusts to different phases of the business or academic year) and time-dependent (eg. different work is completed in the morning from that done in the afternoon). The enterprise portal should apply the information it knows or learns to refine the relevance of a user's profile.

For example, John tends to read his email every morning, first thing, but never reads it in the afternoon. When John loads his portal in the morning, email should be at the top, but in the afternoon, it shouldn't even appear on that page (maybe just a link instead).

Hong can't enrol at the moment, so there's no point in making the re-enrolment link prominent for her -- let's drop it to the bottom of her administration links.

Ooh! Enrolment for Angela's faculty closes today, and she hasn't enrolled! Make sure that the link appears at the top of her portal page, and send her an SMS on her mobile phone "Urgent, you must enrol today!!!"

Reliability

In order to be perceived as a useful and readily available service, a portal needs to be reliable and highly available. It is easy enough to make the portal server itself highly available but the multitude of online services upon which the portal relies make the problem much more complex.

The portal must cope with inherently unreliable subservient systems. For example, some systems still need to have users kicked off during backups. Other systems cannot easily be replicated out of single-point of failure mode (eg. duplicating mail spools could be quite expensive). It is likely to be several generations of thinking in the broader IT community before the majority of systems used by community members are highly available.

At Monash, substantial use of caching is used. This borders on "data warehousing" in some situations where the main system can't cope with the transaction level generated by the number of users.

Short timeouts must be set on requests to subservient systems. When these timeouts are reached, the portal must proceed with whatever data is available. If the subservient system later comes back with the requested data, is needs to be cached for when the user next wants that data (likely to be within a couple of minutes).

Infrastructure reliability is only part of reliability of enterprise portals. It is equally important that the information delivered by the portal and upon which it makes its decisions is well understood.

If someone has the "Breaking news" resource on their portal, the news should be up to date or indicate that it is not (and possibly why not -- eg. the remote site is down). If the portal chooses not to make a resource available to a group of people, this may need to be communicated, lest they think their view of the portal is broken.

Hierarchy of portals

Many applications are being front-ended by products that are being called portals. Some of these "portals" are true portals with Profile-oriented Personalisation capability, others are just web front ends.

To get many of the necessary applications web accessible, the simplest way is often to buy the vendor's portal product. This will lead to large organisations like universities trying to manage many portals. Currently a number of systems have been identified (eg. Student information, Finance/HR, Library, Research, Teaching materials, Community service) and this number is likely to increase with many application vendors assuming they are providing the only portal.

The integration of these portals into a single service for the enterprise, may be more difficult than the underlying application integration issues that the initial use of portals evaded. One portal needs to be the master or umbrella portal and interface to the other portals. This will be a major nightmare for portals that use client side computation such as Java applets.

Interchangeability of profile data

Profile data needs to be sourced from somewhere. Much of the profile data needed is or should be on one of the many systems that the portal integrates with. For example, the system that manages competency attainments should have portal access so community members can see their attainments

Two options exist for interchange of profile data:

With the directory model all the systems feed the directory with profile attribute values which are then accessed by all the systems which use those fields and values. The eduPerson 1.0 specification [ HREF16 ] has recently been released and provides a framework for profile data to be stored in a directory. There is still a lot of work to be done as vocabularies evolve within this specification.

The peer to peer model is where individual systems pass information between them in a network of peer to peer relationships. The peer to peer model has been the focus of IMS [ HREF18 ] [HREF19] although most of the work can be directly applied to a directory model as well.

Lessons from application integration architecture suggest that the multitude of peer to peer relationships is likely to pose significant overheads particularly with multiple relationship management. Other problems are likely to include many systems working with subsets of the full profile, providing some values while using others.

A difficulty with industry specific specifications such as IMS is that many vendors sell into several different industries and do not provide as comprehensive support as those vendors that work within a single industry. For example, difficulties are likely to be experienced sharing profile information between financial systems, student systems and library systems.

Single sign-on and security

Single sign on is the urban myth of the early 21st Century. It has become a goal of many organisations. The goal is to eliminate all the problems experienced with managing many usernames and passwords. The enterprise portal is often seen as the tool which will deliver in this area.

Single sign on may be fine as a goal for a range of low security issues. However with capabilities expanding, a need for security vigilance is critical. Reauthentication potentially with different keys should be enforced whenever a significant change in context or security level is requested by the user. Traditionally this has required the user to type in their old password before being able to change it. With access to an increasing range of resources from multiple organisations available through a single portal, single sign on as a goal needs to be changed. The issues are:

Using the same authentication system for access to most services makes a lot of sense. A single username and password may be fine for accessing all the systems within an organisation but enterprise portals offer capability to interact with systems and services outside the organisation such as Internet banking. A portal channel providing all financial information amalgamated from several different external financial organisations is a relatively straight forward task if users identify the institutions they wish and the security information required to access their data.

The portal could potentially then manage the security keys for all of those financial institutions with the user no longer knowing their values. This could be of great benefit to some users but with a parallel increase in risk.

The portal provider has also potentially accepted significant risk in offering this service. By intervening in the relationship between the financial institution and the end user, the portal provider becomes a party to any dispute in which the portal played a part.

Privacy

The learning portal seeks to know more and more about you, both through your explicit input and through its observation of your activities. This can be very threatening to people and their acceptance of such "smart technologies."

For example, consider the Amazon-style relevance matching for clubs. Members of religious or political clubs may feel quite violated if their membership is indicated with a statement like "Bill has three interests in common with you, and here's a fourth -- the Satanist club, do you want to join?".

It is imperative that portals always deal with information in aggregate, and that personal usage and profile information only ever be available to the individual. In the increasingly strict legislative environment, many organisations are developing the required privacy policies for their websites. [ HREF17 ] It is quite likely, however, that you will need to make much stronger statements regarding privacy for your portal, to ensure that people feel they can trust that they aren't going to be sold out, either to publish potentially embarrassing information, or to receive junk mail from organisations.

These concerns lead to another personalisation aspect -- a paranoia level, where users can indicate their privacy concern from "free and easy" through to "CIA operative" The free and easy may wish for all their friends to be able to see their personal schedule, whilst CIA operatives would prefer not to even appear in directory listings.

An enterprise portal should never leverage individual data for marketing purposes. In fact, it is the authors' conviction that enterprise portals should carefully moderate any advertising at all. The role of an enterprise portal is to protect and serve the community members. Where users select to receive information, such information can be pushed, but if users ever feel like the portal is providing more junk than needed resources, they will no longer feel "served" and leave in droves.

Conclusion

There are a lot of dreams between today's reality and tomorrow's. This paper has raised a variety of issues in relation to the profile oriented personalisation required to deliver the vision of the perfect enterprise portal.

Although the field is highly dynamic with new products, concepts and strategies continually being developed, and easy quick fixes available to most issues, the Enterprise Portal used by an organisation to deliver online productivity must be reliable and based on sound infrastructure, architecture and standards.

Standards and specifications are only starting to emerge on fundamental issues such as interoperability of profiles, and a lot of work needs to be done on the vocabularies to be used within these standards and specifications.

Organisations working in this field need to be prepared for substantial rework on their systems as the standards emerge if they do not participate in the evolution of the standards.

Acknowledgements

Thanks for the many people in the Monash community who continue to work to make my.monash an award-winning success. Special thanks to the Flexible Learning and Teaching Program team, whose efforts ensure that my.monash is well-supported and that associated projects are well-managed to ensure continuing success!

References

  1. Internet2/Educause eduPerson Working Group, "eduPerson 1.0 Specification". February 2001 [HREF11]
  2. Manber, U., Patel, A., Robison, J., "Experience on Personalization on Yahoo!", Communications of the ACM August 2000/vol. 43, No.8 [HREF6]
  3. The Concise Oxford Dictionary of Current English, ninth edition 1995, Thompson, D. ed. page 1064.
  4. Smythe, Colin, Frank Tansey and Robby Robson, "IMS Learner Information Packaging Best Practice and Implementation Guide". December 2000, [HREF13]
  5. Smythe, Colin, Frank Tansey and Robby Robson, "IMS Learner Information Packaging Information Model Specification". December 2000, [HREF14]
  6. Strauss, Howard "How Do You Get Started Building a University Web Portal?", presentation to Educause 2000, October, 2000. [HREF5]

Hypertext References

HREF1
http://www-personal.monash.edu.au/~rons/
HREF2
http://www.its.monash.edu/
HREF3
http://www.monash.edu/
HREF4
http://polynate.net/
HREF5
http://www.princeton.edu/~howard/slides/portals_files/frame.htm
HREF6
http://www.acm.org/pubs/citations/journals/cacm/2000-43-8/p35-manber/
HREF7
http://www.openldap.org/
HREF8
http://mysap.com/solutions/
HREF9
http://company.blackboard.com/products/infrastructure/
HREF10
http://www.w3.org/RDF/
HREF11
http://www.mhonarc.org/
HREF12
http://www.list.org/
HREF13
http://slashdot.org/moderation.shtml
HREF14
http://www.advogato.org/article/38.html
HREF15
http://www.advogato.org/trust-metric.html
HREF16
http://www.educause.edu/eduperson/
HREF17
http://www.law.gov.au/
HREF18
http://www.imsproject.org/profiles/lipbest01.html
HREF19
http://www.imsproject.org/profiles/lipinfo01.html

Copyright

Ron Sawyer and Nathan Bailey, © 2001. The authors assign to Southern Cross University and other educational and non-profit institutions a non-exclusive licence to use this document for personal use and in courses of instruction provided that the article is used in full and this copyright statement is reproduced. The authors also grant a non-exclusive licence to Southern Cross University to publish this document in full on the World Wide Web and on CD-ROM and in printed form with the conference papers and for the document to be published on mirrors on the World Wide Web.