Steven D. Tripp, Center for Language Research, The University of Aizu, Aizu-Wakamatsu, Japan. Phone +81 242 37 2584 Fax: +81 242 37 2599 tripp@u-aizu.ac.jp Home Page [HREF1]
World Wide Web, ESL, EFL, English, networks, database, education
This article describes the goals, software development activities and accomplishments of the Rosetta Project. This project is a modification of the UNITE Explorer for the purposes of creating an English-as-a-Foreign/Second Language database. The UNITE Explorer software tools were used to establish some of the first educational resources database on the World Wide Web including the Explorer [HREF2] and the South Central Regional Technology Center SCRTEC [HREF3] that is supported by the United States Department of Education to serve educators in the states of Kansas, Missouri, Nebraska, Oklahoma and Texas. The UNITE group is currently working with the Rosetta Project to establish a WWW interface to a large data base of language learning resources that will be available on the Internet. A distributed network system for contributing, reviewing, monitoring and mirroring large collections of digital resources will be described.
The purpose of the Rosetta Project is to modify an existing WWW database system in order to archive information and materials for English-as-a-Second/Foreign-Language (ESL/EFL) professionals around the world. The UNITE Explorer system was an early WWW application which provided information to educators. The original intent of UNITE was to develop a system contributing to K-9 mathematics and science education by allowing educators and students to remotely contribute and access multimedia educational resources. This article describes the current state of the Rosetta project and how UNITE will be modified to support EFL/ESL applications.
The UNITE project (Aust, 1994, Aust, Newberry & Resta, 1996 [HREF 4] ) developed a WWW server and client which provide access to a multimedia educational database. The target audience included teachers in 52 schools in the northeastern United States that are part of the Great Lakes Telecommunications Collaborative. UNITE provided an archive and a search mechanism for educational resource materials. The database can be accessed via a hierarchically structured curriculum taxonomy or a graphical search window can be used to specify Boolean queries. The current UNITE project has expanded far beyond it original goals. It now encompasses schools throughout the USA and is developing tools for Native American education.
In the UNITE Explorer, the available resources are organized in databases. Figure 1 shows the general structure of the original UNITE Explorer. Essentially, information processing in the Rosetta system will follow the same structure, although the details, icons, and appearance will differ. Also, the search engines are constantly being upgraded. In each database the structure, format, and treatment of the database records is described in a configuration file. A database configuration language is used to specify record structure, and defines four basic objects: TABLE, ENUMERATION, RECORD, and DATABASE OBJECT.

Figure 1.Database and related processes
In these systems, TABLEs keep information about icons used by the client. ENUMERATIONs list the possible instantiations for a descriptive category. RECORDs keep information about the filename and size. DATABASE OBJECTs keep information about classification categories among other things. In the next section we will be concerned with the ENUMERATION of descriptive categories.
The original ENUMERATIONs were designed with North American mathematics and science teaching in mind. Inevitably, certain modifications were necessary for foreign language teaching, but beyond that, recent developments in making web-based information more uniformly accessible had an influence on our design. We were familiar with such proposals as HotSauce[HREF 5] and the Dublin Core. [HREF 6] The Dublin Core suggested standards for metatags which resembled the original resource type enumeration for UNITE. We decided to utilize the Dublin Core and expand it where needed.
The Dublin Core, as of late 1996, consisted of 15 meta-categories called elements:
| 1. Title | The name given to the resource by the Author. |
| 2. Author or Creator | The person(s) or organization(s) primarily responsible for the intellectual content of the resource. |
| 3. Subject and Keywords | The topic of the resource, or keywords or phrases that describe the subject or content of the resource. |
| 4. Description | A textual description of the content of the resource, including abstracts in the case of document-like objects or content descriptions in the case of visual resources. |
| 5. Publisher | The entity responsible for making the resource available in its present form, such as a publisher, a university department, or a corporate entity. |
| 6. Other Contributors | Person(s) or organization(s) in addition to those specified in the CREATOR element who have made significant intellectual contributions to the resource. |
| 7. Date | The date the resource was made available in its present form. |
| 8. Resource Type | The category of the resource, such as home page, novel, poem, working paper, preprint, technical report, essay, dictionary. |
| 9. Format | The data representation of the resource, such as text/html, ASCII, Postscript file, executable application, or JPEG image. |
| 10. Resource Identifier | String or number used to uniquely identify the resource. |
| 11. Source | The work, either print or electronic, from which this resource is derived, if applicable. |
| 12. Language | Language(s) of the intellectual content of the resource.. |
| 13. Relation | Relationship to other resources. |
| 14. Coverage | The spatial locations and temporal durations characteristic of the resource. |
| 15. Rights Management | The content of this element is intended to be a link (a URL or other suitable URI as appropriate) to a copyright notice, a rights-management statement, or perhaps a server that would provide such information in a dynamic way. |
To this list we added several elements: MediaType, TargetLanguageType, EdLevelType, CurriculumType. Certain elements such as Coverage and Relation were not well-defined in the Dublin Core documentation so we defined them to fit our purposes. In addition, we extended several of the above elements, such as Resource Type.
CurriculumType is one of the controlled vocabularies used in indexing the educational resources. This consists of a hierarchical structure with more than 150 subcategories. Construction of this curriculum hierarchy was problematic, since there appears to be no international standard for classifying ESL/EFL curriculum materials. The major subcategories for CurriculumType were established as: Communication, Errors and Mistakes, Grammar, Historical Perspectives, Language and Psychology, Language and Society, Research, Student-made Materials, Teaching, and Testing.
Although the Dublin Core has a Format element, this is intended to conform to MIME type classification so we needed to include a MediaType to classify the physical medium on which the materials are stored, as not all materials will be digital.
The Dublin Core Element for language indicates that the language of the resource should be represented by the NISO Language Codes [HREF 7] standard. We are using this element to represent L1 (the medium of instruction). This list of more than 360 language and their codes is too long to include directly into a user-friendly interface so we created a hierarchy consisting of English, Major Asian Languages, Major African Languages and so on and a category called Other Languages for those languages unlikely to used as a language of instruction.
We then created a new element, TargetLangugeType, to specify L2, the dialect of English which is being taught. We created five categories: General English, American English, Australian English, British English, and Canadian English. Other dialects will be added if there is a need.
Our new element, EdLevelType, was an attempt to classify materials by target age, keeping in mind that educational systems around the world differ considerably. We are using: Pre-Kindergarten, Kindergarten, Lower Primary, Upper Primary, Lower Secondary, Upper Secondary, Undergraduate, Graduate, and Adult,
Coverage was an element of the Dublin Core but we specified it to refer to the geographic location and time duration. Geographic location will be specified by latitude and longitude and time by beginning and ending. This is necessary because some curriculum materials refer to actual physical locations and field trips for student and thus should not be displayed to people outside a certain narrow geographic radius. Such actual locations may be things like exhibitions which have a limited duration.
The Dublin Core proposal for metadata contains an Resource Type element. This element allows description of the genre of the resource that the Dublin Core metadata was generated for. The following enumerates the resource types anticipated.
Although this classification covered many types of materials which were in our pre-exiting resource types enumerations we found it necessary to add:
One purpose of creating authority lists, such as these, is to assist in the retrieval of data from the database. Naturally free text searches are always available, but such searches do not always produce the expected results as any user of one of the Internet search engines has experienced. The careful structuring of the database allows several modes of access for busy teachers searching for needed materials. In addition to free text searching, a Boolean search engine is part of the UNITE system. With this engine, a teacher can specify precise descriptions of the kinds of materials he or she is looking for. This Boolean search engine and its interface are being modified and improved at the present time.
Another mode of access to the database is in its Browsing interface. Since an elaborate Curriculum Hierarchy has been constructed, individual items in the database can be displayed according to their place in the hierarchy. This can be represented graphically as illustrated in Figure 2. One advantage of this kind of graphical display is that users may see quickly what kinds of categories are instantiated and that may lead to more informed searching.

Figure 2. Curriculum Hierarchy Browser
A final mode of access is the Folders interface. This is similar to the familiar desktop metaphor employed by the Windows orMacintosh operating systems. An example of this kind of display is seen in Figure 3.

Figure 3. Folder Browser
At the present time the Rosetta graphics and icons are being designed so the above examples are from the UNITE Explorer system, but the Rosetta system will operate analogously. By clicking on folder icons, deeper levels of the database are revealed.
Although it is implicit in Figure 1, the issue of populating the database has not been discussed. A singular aspect of the UNITE Explorer system is that it encourages Community Published Resources. Specifically, all objects in the database come from the community of educators. For that reason, the UNITE Explorer system incorporates a Contributor interface (Figure 4.) which allows users to enter object into the database. The system then engages a series of human reviewers who check the entry for accuracy, redundancy, etc. There is a tracking system, invisible to the user, which monitors the progress of contributions through reviewers and allows administration of this work.

Figure 4. Contributor Interface
The UNITE Explorer client-server system has many features which make it attractive as a database browsing/query system for ESL/EFL purposes. The WWW-compatible structure of its output make it accessible by a wide variety of platforms in many countries. The power of its multiple search methods can make accessing large information database more productive than pure text-base searches. It is highly suited to adaptation to the Rosetta database. However, making information usable, as well as available, requires that it be classified in standard ways. We expect that the future will introduce more powerful information searching and browsing interfaces. In order to prepare for that we have tried to conform to the Dublin Core on our database design. The ESL/EFL community is world-wide and extremely diverse . Designing a system which meets the needs of such a disparate audience will continue to be a challenge.
R Aust (1994) "Designing network information services for educators" Machine-Mediated Learning, 4, 251-267.
Steven D. Tripp ©, 1997. The authors assigns to Southern Cross University and other educational and non-profit institutions a non-exclusive licence to use this document for personal use and in courses of instruction provided that the article is used in full and this copyright statement is reproduced. The authors also grants a non-exclusive licence to Southern Cross University to publish this document in full on the World Wide Web and on CD-ROM and in printed form with the conference papers, and for the document to be published on mirrors on the World Wide Web. Any other usage is prohibited without the express permission of the authors.