Georgios D. Styliaras, High Performance Information Systems Laboratory, University of Patras, Patras 26500, GREECE gds@hpclab.ceid.upatras.gr
Theodore S. Papatheodorou, High Performance Information Systems Laboratory, University of Patras, Patras 26500, GREECE tsp@hpclab.ceid.upatras.gr
We are presenting the sequence of steps needed for successfully developing a thematic-based collection in the World Wide Web. Such a collection presents a certain thematic topic and is analyzed in a set of objects hierarchically organized and classified under various criteria. Every object is presented by a collection page, which becomes the core of the collection's analysis. We also allow for every object to be accompanied by multimedia data and embedded configurable behavior. Furthermore, we describe an efficient administration subsystem, which can be used by content providers for maintaining the collection content. The basis for our methodology is our experience while developing thematic-based collections for the Hellenic Ministry of Culture (HMC) web site. Instead of other methodologies that emphasize on the navigation layer, we suggest that a data-centric approach is more appropriate while developing structured hypermedia content, as it enables customization, flexible presentation, and reusability.
In this paper, we deal with a specialized kind of content in the Web, which we call thematic-based collections. Such collections consist of a set of hierarchically structured objects, such as the exhibits of a museum, which are interrelated and classified over various criteria. As presented throughout the paper, most of the attention given to hypermedia development mainly addresses specialized navigation and linking tools or methodologies describing the development of generic applications. However, thematic-based collections present some special properties and requirements common to all collections of this kind. Practical experience has shown that there is a need and the potential to obtain more efficient ways for developing thematic-based collections by taking into account these properties and requirements. In this work, based on our experience while developing collections for the HMC web site and on related work on similar issues, we contribute in this direction by formally analyzing the notion of thematic-based collections and by providing a methodology for developing such collections in the Web. Namely, we describe a methodology for the import, classification and internal structure of the collection pages, which allows flexible content representation, and a data-centric administration framework, which facilitates the content's efficient manipulation, reuse and handling of exceptional content instances. We have implemented most of the methodology as independent modules during the development of the previously mentioned collections and other hypermedia applications, some of which are presented in the last section.
We define a thematic-based collection as the deployment of a certain topic in the Web, which can be analyzed in a hierarchy of objects. Such a topic can be a museum, an exhibition, a library, structured product catalogs, educational material, scientific or encyclopedic data, an event, or, generally, a well-organized knowledge area. Respectively, an object may describe a museum exhibit, a book, a cinema film, a music record, a sport event, a news story, or a certain product. A similar issue is discussed in Elo et al. 1998, where a tool for navigating info hierarchies is presented, where, however, emphasis is given on web resources.
From an architectural point of view, in every level of the collection hierarchy, there is a set of chapters, which represent the categories in which the collection is structured. Additionally, every chapter in a certain level may point to another set of chapters in a lower level, which further analyze its meaning, building, thus, a hierarchy of chapters. A set of collection pages is associated to every chapter. Every such page analyzes a self-contained object that belongs to the collection. In every level of the hierarchy, there is also a set of introductory pages containing text and images that present as a brief report the same-level collection objects and refer to their pages by using hyperlinks.
A collection page has a title and a basic text organized in structuring blocks, as analyzed later. It is also accompanied by some characteristics: a textual description and some discrete attributes that compose a set of annotating metadata. A certain collection page is distinguished by a unique ID identifier, which is used for defining hyperlinks to the page from other pages, introductory texts and images. The same ID denotes the position of the page in the collection hierarchy. Additionally, the page's characteristics and relationships may be retrieved based on the ID and access structures including the page may be implemented. Furthermore, we also build a uniform pool of multimedia content consisting of images, slideshows, sound and video, which can be used for enriching the presentation of the main collection pages. Moreover, extra indexes can be created for providing alternative access routes to the same collection pages, apart from the main navigation option based on the chapters' structure. Figure 1 gives an overview of the collection structure at a certain Level I.

Figure 1: Structure of a thematic-based collection
In the remainder of this section, we describe a methodology for developing thematic-based collections, addressing the needs of content providers and web developers. There is a series of steps that need to be followed. For every step, apart from the methodology, we also provide some general requirements and recommendations resulting from our implementation experience. Although most of the functionality described has been implemented and tested in various collections, sometimes in an isolated way, we do not attempt to present an automatic tool for constructing collections without any user intervention, but to figure out a flexible and feasible methodology that may be adapted in specific cases and demands. We have implemented the methodology for serving the needs of thematic-based collections, but this methodology is flexible enough in order to be applied when having to deal with structured content presentation in the Web. This is the main contribution of our proposition. We also describe and implemented a data-centric administration subsystem, which is the basis for the content maintenance and reuse in other circumstances. It is true that propositions and implementations about issues concerning thematic-based collections such as content importing (Hartley and Medley 1996; Balasubramanian et al. 1997; Rockley 1993), maintenance (Christodoulou et al. 1998; Rockley 1993) and flexible document synthesis based on structured data (Paradis and Vercoustre 1998; Active Server Pages [HREF6]) have appeared many times in literature and commercial tools. With this proposition we try to extend the work that has been done by combining the strength of automatic hypertext generation with the expressiveness of manually constructed content, while ensuring consistency at any time. Furthermore, we try to support flexibility by defining standard content and behaviors on objects and by allowing, when possible, that these may be overridden in specific instances.
In most of the cases, the content used as a basis for a thematic-based collection already exists in some other format and in order to reuse it efficiently, we should follow a certain procedure. Concerning text, the critical issue is to extract it in plain format and, optionally, maintain most of the common formatting (such as bold, italics and bulleted lists) that can be exploited during the generation process. The most difficult task is to identify in a semi-automatic way the collection structure in chapters and, possibly, alternative indexes. If text exists in a database or XML files, then, in order to handle the above issues, some simple transformation scripts are needed. Otherwise, if text is unstructured (e.g. from a word processor or HTML), then the ability to import it correctly depends on its accuracy and regularity. In this case, an importing script should be given some rules for extracting the plain text and, if possible, identifying the points that reveal its structure. Such rules include successive paragraph changes and use of numbers, formatting (e.g. font, use of bold, italics), headers and some descriptive words. Furthermore, the importing script should be flexible enough in order to face irregular instances of the source text. Anyway, whether text is structured or not, a library of converting functions should be gradually built for handling different formats of the source data. Kommers et al. 1998 presents several hints about the importing process. We have implemented various content importing schemes based on given requirements. For example, in Figure 2, an input file is shown, which has been used in Treasures of Mount Athos [HREF1]. We can see how input data has been mapped to database fields.

Figure 2: A sample input file combined with the respective database table
Concerning images, we should extract them from the source documents and save them as separate files. Then, the images should be properly named and associated with the right collection pages. The basis for an image's association with a page is the page's ID. Other formats of multimedia data such as video and sound are similarly dealt.
We now explain how collection content can be properly structured and stored. We are using a relational database for this procedure. XML could be used as an alternative format, but it seems that a database still provides stronger query capabilities and facilities than XML. However, as an experiment, we have managed to model the described structuring in XML, using as a test-case the museum structure of Figure 3. The database is used for storing the textual data of the collection, its structure in chapters and indexes and information about multimedia content. The textual data consists of the main content of the pages describing the collection's objects and the introductory pages in every level of the hierarchy. The storage of the pages' main content will be discussed later. The way content is structured in the database enables its efficient retrieval and reuse. As we will see in the administration subsystem, using the database, query-like questions may be formulated in order to retrieve the desired content.
During data import, images and other multimedia content collected compose the uniform pool of multimedia data that is used to enrich the appearance of the collection pages. Content providers may use the administration subsystem in order to augment the pool with additional content. Every multimedia piece of the pool can be used in more than one collection pages and vice versa. There is a unique table for representing every kind of multimedia. For every multimedia piece, a standard caption and description, which may be overridden in a specific appearance, are stored in the database along with behaviors that, as examined later, may be attached to multimedia pieces. Moreover, supplementary metadata are defined on a multimedia kind basis, such as width and height for images and duration and format for video and sound clips. The latter are needed for properly activating and using the clips. A derived multimedia kind is the slideshow, which consists of a set of images selected from the existing pool of images and displayed together in a certain page. We will later see the use of slideshows. For a certain slideshow, necessary data stored in the database include relationships to the images that compose the slideshow and their presentation order. We do not embed the multimedia content in the database, but we only define a virtual relationship between multimedia tables and actual files, as we do not plan to use specific features of the raw multimedia data, but only its metadata and the ability to activate multimedia pieces and create hyperlinks to them. Validation scripts ensure that there are no broken links between database entries and actual files.
Separate tables store the pages' main content and internal structure. For every page, its ID, title and basic text are stored. Basic text is organized as a set of structuring blocks, each one of which corresponds to a different database field. The decision about the basic text's structure is usually based on the desired appearance. Generally, we have to define a separate field for every portion of the basic text we want to manipulate and present separately. Different sets of fields may be defined for different sets of pages that we want to manipulate separately. The semantic meaning of defined fields is stored in descriptive database tables, enabling the content's understandability by humans and its conversion to other formats, such as XML. A sample field structuring used to identify a museum is shown below:

Figure 3: Sample structure of a page's basic text
The page's characteristics are also stored in the database. Apart from the textual description, characteristics also include a set of discrete attributes such as technical details (size, content source) and user and editing information that are exploited by the administration subsystem (in order to support multi-user editing) and the generation process. Moreover, an extra set of custom attributes can be defined in a collection context. Concerning multimedia, separate tables are used for storing the relationships among collection pages and multimedia pieces and, possibly, captions that override the default ones. For example, in a page's metadata, the number N of images associated with it is stored. Then, the logical names of the images in the page's context are: Image{ID}{X}, X=1 to N. The same terminology and relation procedure is used for sound, video and slideshows. The above are summarized in Figure 4. In our collections, we have mainly used images to enrich the collection's pages .

Figure 4: Page-based structure and metadata
The collection's overall structure in chapters is stored in properly defined database tables. The relationships among chapters and pages are expressed through pages' IDs. As mentioned before, alternative indexes additional to main chapters may be defined. An index consists of a set of objects each of which points to collection pages. The structure of indexes and the relationships among indexes and pages are also stored in the database. Extra attributes may be defined on relationships revealing the kind of relation between a page and an index object. For example, in a library collection, a persons' index may be defined that allows multiple persons to be assigned to a book. A descriptive attribute can reveal a person's role in the book e.g. author, translator.
The way data is organized in the database allows the definition of several navigation paths, which lead to the same collection pages. In this way, a user may find his/her own way for navigating through the collection's pages based on particular interests. Additionally, we allow visual cues (Kesseler 1995; Garzotto et al. 1995) to be embedded in the collection's pages, making thus easier navigation among different categorizations. The basic navigation scheme is through the representation of the collection's structure using chapters. Therefore, every chapter has links to the lower-level chapters and, in every page, these navigation options exist: Up (in the collection hierarchy), Previous, Next (among pages belonging to the same level).
The use of indexes provides an extra navigation capability. More than one index objects may be associated with a certain collection page and, on the opposite, an object may be associated with more than one pages, making possible a two-way navigation: From an index object it is possible to navigate to related collection pages and, adversatively, starting from a page, navigation is possible to related index objects. Using the library example again, the user might navigate among books' pages and corresponding authors. It is necessary that the importing process support the association between pages and indexes. For example, regarding the indexes we have implemented in New Greek Cinema [HREF2], the association between films and directors has been automatically retrieved by using string comparison between text extracted from the film posters and texts storing directors' CVs. Figure 5 summarizes the above.

Figure 5: Navigation structures
Finally, we define two further navigation options. The in-page navigation is used to activate multimedia data that is associated with a certain collection page. A similar approach appears in Lyadret 1998. Additionally, as examined later, hand-made access structures may be used for categorizing and presenting the same collection content under other contexts.
An automated generation process is used for producing the HTML version of the collection pages based on their internal structure and accompanying metadata. Similarly, pages for the introductory texts and navigation structures are also generated. We will examine in the administration subsystem how exceptional instances of objects may be located and handled.
For every collection page, the generation process produces its basic text, based on its internal structure according to fields. Moreover, hyperlinks will be generated pointing to other collection pages, navigation structures and multimedia content, by exploiting the pages' IDs and relationships to multimedia content. Broken links are avoided by using validation scripts. Some fixed parts are also generated for collection pages including a logo, header, footer and visual cues, such as the page's position in the hierarchy. Finally, templates and behaviors can be used for enhancing the pages' appearance and interactivity respectively.
A template can be used in the generation process for configuring the appearance of a page's basic text. More specifically, common HTML scripting is mixed with specialized tags representing the page's fields, exploiting in this way the field structure for determining the desired appearance. Fixed parts may be configured in a similar way. Furthermore, additional global properties may be defined in the form of stylesheet that may be differentiated in every level of the collection hierarchy. A template we have implemented for configuring the appearance of VHF's Subject Catalog (Styliaras et al. 1999) is shown in Figure 6, where extra tags that are mapped to fields are highlighted.

Figure 6: Using a template
Furthermore, using behaviors, we try to enhance user interaction with the content by using and extending functionality already existing in popular multimedia authoring tools, such as Macromedia Director [HREF4]. We define behavior as a specialized and configurable routine that is attached to an object and executed independently from the location that the object is used. This standard behavior may be overridden if not desired in a certain object instance. We extend and generalize in this way the simple scripts that are implemented using JavaScript and Dynamic HTML and inserted in the objects' tags. This extension is made possible thanks to the way objects are structured in the database. Behaviors are stored in the database and are associated to objects via relationships on which necessary parameters for their instantiation and configuration are also stored. The generation process takes into account these parameters and converts the configurable behavior into a working HTML fragment that is inserted in the object's tag. Behaviors may be applied on images, navigation elements or simple hyperlinks and can be activated by clicking, passing over or leaving an object.
As an example, a simple behavior may be assigned to a certain image that will activate a hyperlink by clicking the image, wherever it appears. Such a behavior has been used in Styliaras et al. 1999; Papaterpos et al. 1999. A more complicated and generic behavior could be to disconnect the destination of a hyperlink from the URL (as also proposed in Hartman et al. 1997; Lewis et al. 1996), allowing, simultaneously, a set of pages to be candidate as the destination of a hyperlink. The IDs of the candidate pages are in this case the behavior's parameter. During the generation process, the set of candidate pages is stored in a Dynamic HTML layer that appears when the user passes over the hyperlink on which the behavior is attached. Furthermore, instead of the URL, the title and maybe the description of the candidate pages appear. This capability may enhance adaptive and link preview systems such as the ones described in Perkowitz et al. 1999; Kopetzky et al. 1999. Examples of desired behaviors include:
We have extensively used behaviors for implementing thumbnail-based slideshows in the various hypermedia applications we have developed. An example slideshow is shown in Figure 7.

Figure 7: Slideshow
The base of our methodology is the administration subsystem, which supports content providers towards maintaining, enriching and further exploiting the accumulated content. The critical step in various hypermedia design methodologies (Schwabe et al. 1997; Isakowitz et al. 1995), is the navigation layer, disregarding, in this way, the importance of content design and manipulation. Furthermore, in other tools, such as LivePage [HREF5], users may define simple access structures, such as tables of contents, guided tours or indexes that end up, most of the times, as simple lists of objects that constitute the structure. Concerning main content, this appears as a simple enumeration of the attributes characterizing an object followed by a portion of non-analyzable text, as in Ardissono et al. 1999. In order to cope with these drawbacks and support the development of a collection, we construct a web-based, data-centric administration subsystem, in which both page- and structure-based administration options stem from the collection's content and structure.
Firstly, using page-based administration, the content provider may change and reuse the collection pages' content, classification and internal structure. Concerning the latter, the fields that constitute the pages' internal structure may be reorganized, such as merged, split or deleted based on some rules. In any case, necessary updates that need to be done and are caused by the fields' reorganization, such as page appearance in terms of fields, are requested by the user. The first step of this kind of administration is to determine the desired set of collection pages to be edited based on some criteria, such as:
Pages matching the criteria appear as a list of hyperlinks pointing to their editable versions. Upon selecting a page to edit, this becomes unavailable for other users, enabling, thus, many users to work simultaneously in different areas of the collection, according to their expertise. A form is displayed for the selected page. Using the form, the editor may modify the:

Figure 8: Sample administration form
Using the same administration interface, users may also add new pages to the collection, delete others, or isolate some collection objects for presenting them exceptionally in cooperation with the generation process. For example, in a collection describing a museum's exhibits, important exhibits should be specially presented by showing more details. This can be achieved by defining extra attributes and developing additional hand-made content for the specially handled objects and by seamlessly integrating this extra content with the standard content already defined. Furthermore, extra content should be correctly maintained and synchronized in case of editing. This feature helped us dealing with irregular object instances in hypermedia applications we have built. Furthermore, page-based administration and generation processes enable the content's presentation in multiple formats, by including or omitting information on a field basis, according to targeted user-groups. The difference with similar approaches (Rousseau et al. 1999) is that, in our case, flexible presentation is supported by initial structure design. Moreover, part of the collection's content including attributes and behaviors may be selected and stored in a clipboard-like structure from which all content or part of it (e.g. selected fields) may be retrieved for composing new documents.
Using structure-based administration, the content editor may restructure the collection in terms of chapters and additional indexes. Any change in structure, such as chapter movement, results in appropriate recalculations of the collection objects' relationships to navigation structures. A treelike interface can be used for structure-based administration, where chapters appear as nodes and collection objects as leaves. The implementation of this administration scheme appears in Styliaras et al. 1999. This kind of restructuring may be used instead of or in cooperation with page-based administration.
Finally, the collection's content may be presented more precisely and interactively by using hand-made access structures the function of which is integrated in the collection's presentation. Examples include map-based navigation systems, tree-like structures with expanding and collapsing of nodes, imagemap-like navigation and automatic display of a sequence of pages. The steps for implementing such a structure are:

Figure 9: Administration options
Some annotated screenshots are presented from three collections that we have implemented for HMC's web site. In these collections, different parts of the described functionality have been implemented, ranging from content importing to access structures, pages' generation and administration. As can be proved by the presented collections, we have managed to structure content with various representation needs and we have implemented the preseneted methodology in independent modules depending on the desired functionality. We have also been inspired by ISTOPOLIS (Papaterpos et al. 1999), a web-based educational application we have developed, and VHF's Subject Catalog (Styliaras et al. 1999), in which a customized administration subsystem has been implemented for gathering multilingual content from several users. Finally, ideas regarding multimedia content use have been gathered and implemented in a multimedia application we have built presenting the life of Melina Merkouri. A slideshow from this application appears in Figure 7.

Figure 10: Treasures of Mount Athos [HREF1]

Figure 11: New Greek Cinema [HREF2]

Figure 11a: Time-based navigation in New Greek Cinema

Figure 12: Hellenic Jewelry [HREF3]
A methodology for creating thematic-based collections has been presented having in its core a data-centric interface with the aid of which part of the collection's content enriched with its supplementary data may be edited, restructured and transformed according to the desired functionality. Additionally, a pool of content is created in this way, containing structured text, hyperlinks and information about multimedia data, which can be reused in other circumstances, such as composing new documents, organizing access structures and search services over the content or simply create hyperlinks. In any case, integrity and defined behaviors are always maintained.
Hartman, J., Proebsting, T., & Sundaram, R. (1997). Index-Based Hyperlinks. Sixth WWW Conference.
Schwabe, D., Rossi, G., & Barbosa, S. (1996). Systematic Hypermedia Application Design with OOHDM. ACM Hypertext 1996.
Isakowitz, T., Stohr, E., & Balasubramanian. P. (1995). RMM: A Methodology for Structured Hypermedia Design. Communications of the ACM, August 1995, pp. 34-44.
Kesseler, R. (1995). A Schema based approach to HTML Authoring. Fourth WWW Conference.
Styliaras, G., Christodoulou, S., & Papatheodorou, T. (1999). Extending and interacting with the ITC system. AACE WebNet 1999.
Papaterpos, C., Styliaras, G., Tsolis, G., & Papatheodorou, T. (1999). Architecture and implementation of a network based educational hypermedia system. IEEE ICMCS 1999.
Christodoulou, S., Styliaras G., & Papatheodorou, T. (1998). Evaluation of Hypermedia Application Development and Management Systems. ACM Hypertext 1998.
Garzotto, F., Mainetti L., & Paolini, P. (1995). Hypermedia design, analysis, and evaluation issues. Communications of the ACM, August 1995, pp. 74-86.
Hartley, S., & Medley. M. (1996). Enhancing teaching using the Internet. ACM Integrating Tech in C.S.E.
Balasubramanian, V., Bashian, A., & Porcher, D. (1997). A large-scale hypermedia application using document management and web applications. ACM Hypertext 1997.
Lewis, P., Davis, H., Griffiths, S., Hall, W., & Wilkins, R. (1996). Media-based navigation with generic links. ACM Hypertext 1996.
Rockley, A. (1993). Putting large documents online. ACM SIGDOC 1993.
Paradis, F., & Vercoustre, A. (1998). A Language for Publishing Virtual Documents on the Web. ACM WebDB 1998.
Lyadret F., Rossi G., & Schwabe D. (1998). Using design patterns in educational multimedia applications. AACE Ed-Media 1998.
Perkowitz, M., & Etzioni, O. (1999). Towards adaptive web sites: conceptual framework and case study. Eighth WWW Conference.
Rousseau, F., Garcia-Macias, A., Valdeni de Lima, J., & Duda, A. (1999). User adaptable multimedia presentations for the WWW. Eighth WWW Conference.
Kopetzky, T., & Muhlhauser, M. (1999). Visual preview for link traversal on the WWW. Eighth WWW Conference.
Elo, S., Weitzman, L., Fry, C., & Milton, J. (1998). Virtual URLs for browsing and searching large information spaces. AACE WebNet 1998.
Kommers, P., Fereira, A., Kwak, A. (1998). Document Management for Hypermedia Design. Springer.
Ardissono, L., Console, L., & Torre, I. (1999). Exploiting user models for personalizing news presentations. Second Workshop on Adaptive Systems and User Modelling on the WWW.
Georgios D. Styliaras and Theodore S. Papatheodorou, © 2000. The authors assign to Southern Cross University and other educational and non-profit institutions a non-exclusive licence to use this document for personal use and in courses of instruction provided that the article is used in full and this copyright statement is reproduced. The authors also grant a non-exclusive licence to Southern Cross University to publish this document in full on the World Wide Web and on CD-ROM and in printed form with the conference papers and for the document to be published on mirrors on the World Wide Web.
[ Proceedings ]