Dr Joanna Richardson [HREF1], ePrints Project Manager [HREF2], Division of Information Services, Griffith University [HREF3], Queensland, 4111. j.richardson@griffith.edu.au
This paper provides an overview of the impetus towards open access to scholarly outputs in institutional repositories. It contrasts this with two major research assessment initiatives: Research Assessment Exercise (RAE) and Research Quality Framework (Australia), which focus on creating repositories restricted to assessors / expert panels for viewing purposes. The paper examines how these latter activities could be leveraged to increase content in eprint repositories. In addition the traditional self-archiving approach for populating eprint repositories is contrasted with a model based on mediated submission.
Open access to publicly funded research has been a worldwide topic of debate for a number of years. The logical consequence has been the development of a model to facilitate such access in full text via the Internet. Based on the Open Archives Initiative (OAI) [HREF4], such a model "develops and promotes interoperability solutions that aim to facilitate the efficient dissemination of content" [HREF5].
The first, widely publicised implementation of this model was EPrints [HREF6]. The open source EPrints software, developed by the EPrints Services team, provides a web-based institutional repository and has a large and growing install base around the world.
As this model and others evolved as mechanisms for communicating the results of ongoing scholarly research, certain leaders in these developments came to see interoperability as an increasingly important issue to be addressed. Two key interoperability problems were identified as impairing the impact of these newly developed archives: end users were faced with multiple search interfaces making resource discovery harder, and there was no machine-based way of sharing the metadata. Solutions that were being explored included cross-searching of archives (on the one hand) and harvesting archive metadata (on the other hand) in order to provide centralised search services [HREF7]. As a result of the so-called Santa Fe Meeting in 1999, there was substantial support for Open Access. It also led to the promotion of interoperability via developing OAI-PMH (Open Archives Initiative - Protocol for Metadata Harvesting) [HREF8] as an open standard for harvesting metadata records from one system to another.
As a result of these and related developments, universities and other research entities began to create institutional collections -initially known in many cases as an 'eprint repository'- for storing research content and facilitating resource discovery. The basis for populating was self-archiving, i.e. the practice in which authors deposit their own work into the repository. Depositing involves a simple web interface where the depositor keys / pastes in the metadata (descriptive data) and then attaches the full-text document.
The concept of self-archiving has not, however, fulfilled initial expectations. As was reported, for example, in 2005 as part of the DAEDALUS Project [HREF9], "The majority of academic staff felt that they did not have the time to self-deposit, and were particularly unwilling to do this where they had already provided publication details to a departmental administrator, for example [HREF10]."
In 2004 the Select Committee on Science and Technology, UK Parliament reported [HREF11]:
ALPSP told us that "although more than 50% of publishers permit authors to self-archive their own articles in either preprint or publishing form, an extremely small proportion of authors are actually doing so". SHERPA posited an explanation: "the main challenge at the moment is not setting up the repositories per se but populating them. Academics do not currently have many major incentives to archive their material (or at least they are unaware of the benefits of repositories)". We found this to be true. It was clear to us that the main focus of academics was on the initial publication of their articles in a recognised journal and that subsequent self-archiving was relatively low on their list of priorities. We found it worrying that academics did not take an interest in what happens to their research after it has been published.
We will return to this critical success factor later in this paper.
On what basis then is research conducted, some of which subsequently finds its way into an eprints repository? Unquestionably the policies established by national funding agencies are a major driver. These in turn are derived from strategic priorities set by their respective governments, which have recognised the importance of research for future economic growth and social transformation. The identification of key areas for collaboration both between research organisations and with industry is one means of attempting to cope with the increasing competition for research funds. Etzkowitz and Leydesdorff discuss some of the inherent dynamics in their examination of the current research system in its social contexts in a number of countries [HREF12].
This then raises the obvious question: How does a government assess the quality of the research it is funding? A useful precedent for a proposed Australian initiative, which we will discuss shortly, is the Research Assessment Exercise (RAE) [HREF13] as it has evolved over the past two decades in the United Kingdom. The RAE is an activity undertaken every 5 years by the UK higher education funding councils to evaluate the quality of research undertaken by British higher education institutions. RAE submissions from each subject area (or unit of assessment) are given a ranking by a subject specialist peer review panel. The rankings are then used to inform the allocation of quality weighted research funding (QR) that each higher education institution receives from their national funding council.
Previous RAEs took place in 1986, 1989, 1992, 1996 and 2001. The next exercise is scheduled in 2008. However it was announced in the 2006 Budget that after this exercise, the current model would be abandoned for a simpler, less burdensome one [HREF14]. Instead a system of metrics may be developed to inform future allocations of QR funding. Gordon Brown, Chancellor of the Exchequer, has called on universities in the UK to agree upon a replacement, indicating that such an outcome might preclude any necessity for the 2008 RAE activity.
The last statement notwithstanding, some UK universities are obviously readying themselves for 2008 by tapping into the release of software plugins designed to adapt two of the major institutional / eprint repository software systems to accommodate RAE requirements. As an Information Systems Developer involved in this project wrote in an informal email to the author of this paper, "We are envisaging that when a replacement has been created for the RAE exercise, it will still require the collection of publications, although there may be additional requirements for metrics to be integrated into the application." We will expand our discussion on this topic shortly when we look at similar developments in Australia.
In May 2004 the Australian Government announced that it would establish Quality and Accessibility Frameworks for Publicly Funded Research [HREF15] as part of the Backing Australia's Ability – Building our Future through Science and Innovation [HREF16]. The principal objective of the Research Quality Framework (RQF) initiative is to develop the basis for an improved assessment of the quality and impact of publicly funded research as well as an effective process to achieve this.
The proposed Australian model, not unsurprisingly, has strong parallels with its British counterpart. However, as it happened, feedback and submissions from the sector in response to various issues took place prior to the announcement in March 2006 that the UK government was changing its RAE model. Moreover in January 2006 there was a change in the Australian Cabinet with Julie Bishop named as the new Minister for Education, Science and Training. This has led to a consequent delay in announcing the preferred RQF model.
At the time of writing of this paper, the Minister has announced the establishment of an RQF Development Advisory Group (RQFDAG) to provide advice on the next phase of the RQF process and especially on how to most effectively implement the model, in whatever format. The Final Advice on the Preferred RQF Model paper from Professor Sir Gareth Roberts [HREF17] is to form the basis for future discussion and activity.
Since mid to late 2005, Australian universities have been grappling with the ramifications of implementing RQF. Angst has been high at all levels, exacerbated by the uncertainty surrounding what will constitute the final version of the "official" model. Along with identifying their top researchers, some universities have been quickly implementing IT systems for capturing these researchers' output --along with associated descriptive details (metadata)-- ostensibly for future submission to expert panels for assessment. Given the complexity of the data gathering exercise, these universities are having to second guess the model so as not to be caught out when a government announcement is ultimately forthcoming.
In early 2006 Treloar [HREF18] outlined the response by ARROW (Australian Research Repositories Online to the World) to RQF regarding a possible repository solution:
Institutions with an existing eprint repository and / or data already captured in electronic format as part of the Department of Education, Science and Training's annual Higher Education Research Data Collection (DEST HERDC) exercise, have an obvious advantage. The University of Queensland reported in early 2006 [HREF19]:
We are now using the Fez-based UQ eSpace repository [HREF20] to manage a local research quality assessment exercise. While the RQF will not start until 2007, UQ is having a trial run to troubleshoot the delivery mechanism and submission and dissemination procedures. Materials included in the assessment comprise journal articles, book chapters, books, conference papers and patents. Much of the material to be assessed is already online, so much of the metadata to be entered includes a DOI pointing directly to the object. Where DOIs are not available, material is being locally added to the UQ eSpace repository for assessors to download and read. The material is restricted to assessors for viewing purposes, so will not appear in the public UQ eSpace view. As yet, the repository software does not allow the addition of assessors’ comments to repository objects, but the comment / annotation feature will be added to Fez before the end of 2006, making Fez a useful tool for delivery of RQF materials to international panels of assessors. We aim to document the process fully for other institutions that might want to make use of Fez to manage the RQF.
If we return to the UK project mentioned above, we learn that the IRRA (Institutional Research Assessment and Institutional Repositories) Project [HREF21] has been funded specifically to design add-on RAE modules for both DSpace and EPrints repository software. The DSpace plugin, for example, facilitates the storage of metadata-only items (default DSpace requires a record to be linked to an object) and those publications which are not suitable for open access. The University of Edinburgh envisages the use of the RAE module to create a publications repository which can import and export from its open access institutional repository. This software is currently restricted to the UK academic community; however, for obvious reasons, a number of Australian universities are extremely interested in the results of this project.
It is quite ironic, nevertheless, that so much effort and resources in both countries are being committed to electronic versions of research that are destined primarily for select panels rather than for worldwide accessibility. There is, however, considerable potential opportunity to be gained from both assessment exercises.
Not only does society as a whole benefit from open access through more effective access to information and an expanded and accelerated research cycle, but the visibility, usage and impact of the work of individual researchers increases [HREF22]. This in turn benefits universities, particularly given the emphasis on impact in the RQF. As Harnad observes, a research assessment model helps drive eprint repositories since the latter increase impact, i.e. downloads / hits. Writing specifically about the UK RAE exercise, he says: "Not only the citation impact but the 'hit' impact, for both the papers and the authors at UK universities and abroad, will be not only accessible but assessable continuously online by anyone who is interested, any time, instead of just in a quadrennial RAE exercise [HREF23]."
Harnad has further suggested that "The RAE ranking will not come from one variable, but from a multiple regression equation, with many weighted predictor metrics in an Open Access world, in which research full-texts in their own authors' Institutional Repositories are citation-linked, download-monitored and otherwise scientometrically assessed and analysed continuously [HREF24]."
One of the more interesting deliverables from OpCit (Open Citation Project - Reference Linking and Citation Analysis for Open Archives) [HREF25] has been the development of a prototype for citation impact analysis and reference linking in large-scale OAI open-access archives. Citebase Search [HREF26] harvests pre- and post- prints from OAI-PMH compliant archives, parses and links their references, and indexes the metadata in a search engine. While still in experimental stage, it has the potential to lessen the current skewing created by the importance placed on high-impact journals, and thus to raise the profile of open access institutional repositories.
In discussing the relative lack of success to date in actually populating institutional eprint repositories, we have highlighted self-archiving, i.e. relying on academics to either deposit their own works themselves or allocate the task to someone else such as a research assistant, as a major impediment.
The DAEDALUS Project reported that the content submission model developed by the project has been one of its key outcomes:
These experiences have led to an expectation that the future of the ePrints Service lies in a centralised model whereby the repository is populated via regular imports either from a central internal publications database or from departmental or faculty databases. It is interesting to note that other institutions, e.g. the Universities of Southampton and Durham, have also adopted a centralised approach. The adoption of a centralised model is also important for the relationship between repositories and the RAE 2008 [HREF10].
Nixon and Greig [HREF27], DAEDALUS Project Managers, have documented the mediated model and workflow used to populate the University of Glasgow's ePrints Service. In Australia Woodland and Ng [HREF28] have reported on a project at Curtin University to design and implement an "integrator system" to share data between an institutional eprint repository and a University publications management system. While not intended to eliminate self-archiving, it builds on the fact that data entered by an academic into Curtin's PUB system --as part of the DEST HERDC exercise mentioned previously-- can be used to automatically populate their espace@Curtin eprint repository.
Therefore the ability for an institution either to capture existing electronic research data --whether it already exists as part of previous research assessment exercises or as data in an eprint repository-- or to create electronic versions of data in print format, and then use that core data to begin to build content to be viewed in a closed environment by RAE / RQF assessment panels naturally leads to the corollary. Data captured as part of the RAE / RQF exercises --and not previously uploaded to a university's eprint repository-- could be used to enhance the latter's collection.
There is unfortunately a catch. The version of an academic's publication required by a research assessment panel (and also as part of the DEST HERDC exercise) is the publisher's version, which is generally not allowed to be deposited in an eprint archive because of publisher copyright restrictions. So supplying a particular version for one purpose, e.g. open access, does not meet the requirements of the other purpose, e.g. closed access for assessment, and vice versa.
The author, as part of a team to implement an institutional repository, realised in 2004 that descriptive data being captured electronically in the PeopleSoft Research Module for the annual DEST HERDC exercise should logically be used to create the core metadata content for an eprints collection. Preliminary discussions were initiated between PeopleSoft systems analysts and the repository team. However higher priorities for both workgroups precluded any development of an actual prototype. Of course an eprint repository could have been created without any integration with PeopleSoft. However, it was felt that there would be more return on investment in designing and implementing such a method for data extraction than to rely on the more traditional self-archiving process for populating the service.
Enter RQF. At the beginning of 2006 as University Senior Management was focusing on the impending RQF, the ePrints Project Team was meeting with the Deans of all 10 Faculties plus Directors of several key Research Centres to gain their support for an institutional eprint repository. The potential spinoffs for future RQF exercises in terms of increased citation impact were promoted, among other benefits.
All were unanimous in their support but more importantly their enthusiasm filtered up the organisational tree. In March the Pro Vice Chancellor (Information Services) was asked to present a paper to the University Executive on the benefits of Open Access and how it could be implemented at Griffith. Undoubtedly one of the "selling" features of the proposal was the population of the repository with data already held in the PeopleSoft system as outlined above. A direct relationship with RQF was established by identifying the initial priority for content to be the "Top 4" publications for eligible researchers as defined in RQF documentation released to date. As a result, the implementation of an eprints service has been allocated a high priority and is to be fast tracked.
At the time of writing the ePrints Project Team is investigating a number of systems for delivering the service. At the same time it is working with the PeopleSoft staff to specify enhancements to their Research Module plus new workflows required to support the integration with an eprint repository. Unquestionably the fact that several of the main contenders for the eprint software system are open source is helping to streamline this process. These systems have been downloaded for preliminary testing
Of course one of the major challenges is the necessity to request two different versions of the same publication from an academic: preprint or postprint for eprint repository versus publisher's for DEST HERDC and RQF. While this can be handled through a single web interface which will eliminate rekeying data, there will still be an imposition on the academic. However we are hoping to leverage from the fact that when they "have" to enter data for DEST HERDC (and RQF, if they meet the criteria), they will view the task of uploading an additional file --either a postprint or preprint, depending upon relevant publisher permission-- as worth the time investment given the potential returns to them.
Despite the generally low uptake worldwide by academics to self-deposit their scholarly and research output in an institutional eprint repository, many support in principle the concept of open access and would be prepared to contribute if required by their institution. For most academics, it is a question of balancing time investment against perceived benefit or a lack thereof.
Research assessment exercises linked to funding --despite the onerous workload they impose-- have a considerably higher priority. With the impending Australian RQF, for example, university administrators are working very closely with Faculties and Research Centres to collect all the necessary data that will ensure at least a continued (or hopefully increased) level of research funding for their respective institution. A University's reputation can rise and fall depending on the outcome.
Notwithstanding the burdensome aspect of RQF, universities have an opportunity to leverage the exercise by building a repository of research output to be assessed as a "closed" process, and to then take the associated descriptive metadata and use it to either create the foundations of an open access eprint repository (if one does not already exist) or more likely to boost content in an existing eprint service. Promoting the eprint repository as a mechanism for potentially increasing "impact" (downloads, hits, citation counts) through open access in preparation for the subsequent RQF should offer some enticement to academics to actively engage in the process.