Automating Online Delivery of Database Content
- an open-source XML-based alternative

Monica Berko, Director, IT Applications, National Library of Australia, ACT, 2600. Email: mberko@nla.gov.au

David Wong, Java and XML Web Developer, IT Applications, National Library of Australia. Email: dwong@nla.gov.au

Table of Contents

Background
Search and Browse
Scenarios
Product Requirements
Budget
Product Research
NLA Application Generator for NLA Framework
Tamino X-Application Generator
Cocoon
eXist
Xinq Solution
Example
Authoring the application specification file
Xinq Architecture
Database Selection
Online Update
Scenarios
Product Requirements
Product Research
Altova XMLSpy Authentic XML Content Editor
Tamino Database with configurable web update interface
Web Application Development Frameworks
Xedit Solution
Xedit Architecture
Conclusion
References
 
 

Background

At the National Library of Australia, we have been developing public web-based information systems, content creation and management systems and internal workflow systems for some years using a locally developed java web application framework.

After developing a few applications - for example the Australian Libraries Gateway21, the Oral History Collections Directory22, the Indexes and Databases service23 - we observed that we would be able to speed up the development process by building an application generator that could interpret a data model description conforming to a locally designed XML schema and generate (using XSLT16) a fully operational web application with search and browse capability and online content creation and editing.

This very basic system could be used for prototyping and assist in the business analysis process as clients could better describe and communicate both their information content and functional requirements with a working model. It was also useful for bootstrapping the development process as developers had a skeleton system which they could build on, particularly if they were new to our framework.

The first version of the application generator with search and browse capability was completed in March 2003 and presented at the Open Publish conference in July 2003. However the generated applications still lacked online update capability. Staff resource constraints relegated the completion of this tool to the back-burner as it is hard to justify the priority of infrastructure improvements over the completion of business-driven projects, as no immediate benefit is obvious. Their value lies in reducing the development effort for later projects - "suffer to gain".

In 2003, the National Library of Australia joined the International Internet Preservation Consortium (IIPC)26. The Deep Web working group has the objective to identify strategies and produce tools for archiving web content which is inaccessible to crawlers. One of the deliverables for this working group was a web-based access tool to search and navigate structured data archives (which are stored as XML) which are the long-term preservation copies of deposited databases and document archives.

A search was conducted for existing products and/or research projects which could be used for this purpose. Although some commercial vendors had similar tools, adopting them would require lock-in to their underlying product and the IIPC26 objective is the provision of tools to support web archiving which do not depend on commercial products.

Here was an opportunity to make use of the work we had done earlier with the application generator. However, as this tool was to be used outside our own environment, it required a complete rethink of the underlying engine. As the solution was required to be deployable using purely open-source technologies, a decision was made that the generated application would use only XSLT16, XQuery17, JSP12 and Java Servlet13 technologies and use as infrastructure any XML database server product that supports XQuery and the XML:DB19 API. The open source XML database server eXist1 and the commercial product Tamino2 have both been used successfully.

The resulting tool is called Xinq (XML-inquire) and it creates a web-application completely from an application-specification file which is written in XML. Xinq currently provides only search and browse functionality. It has been released with an Apache Software Licence and is distributed by SourceForge27.

Although originally developed to meet the need for building accessible archives of deep web content, this tool has more general applicability. Anyone who has a back-end database which they would like to publish on the web with dynamic search and browse capability may find this tool useful for cutting down the time it takes to specify and implement a service. There are a number of sophisticated tools already in the market-place for mapping relational databases to XML schema and extracting the content as XML.

However, to be genuinely useful as a prototyping tool for online information systems development, we required online update capability in the generated prototype. We felt that with a moderate development effort we could extend the principles of Xinq to build a generic update capability. However as the cost of this development effort required justification, we first looked at other alternatives both commercial and open-source.

Search and Browse

Scenarios

A university facilities management section has a venue booking system based on a SQL-Server database and a desktop client application. This system contains information about the various available venues and equipment items as well as bookings for these venues and associated equipment. Some information about the events which require the venue is also stored. They would like this information to be available on the web so that both internal and external customers can check availability before sending through a booking request. This interface should allow the user to browse through the venues and equipment that are available for hire, and be able to view all current bookings. They should also be able to search for a venue based on certain criteria, or even list all the events being held on a particular day. An online booking request form already exists and is used to pass requests through to the venue administrators. This form should be linked to from the search interface.

A large company has just switched over to a new Financial Management system. Although the essential data was migrated from the old system, much of the historical data was not migrated and some items of information kept in the old system have not been migrated to the new system. For record-keeping purposes the legacy database from the old system has been kept but as the system itself has been mothballed there is no longer an easy way to check old data. They would like to make this data accessible via a secure web interface on their intranet so that old transactions, suppliers and customers can be searched for.

The Pandora24 (Australian Web Archive) curators have selected a web site for archiving and the crawler was not able to capture all of the content because it was a dynamic web site with search forms for users to locate the information of interest (commonly referred to as deep web content). The publisher of the web site is keen for it to be archived for posterity because their funding is drying up and the web hosting company that designed and supports the web site is going out of business. They are happy to supply the database that drives the dynamic content of the site. The curators use a commercial XML tool to extract content from various database formats into an XML archive. They need to provide public access to this content in a similar manner to the original web site but no programming resource is available and they have many such archived databases which require their own access interface. A tool to generate online search and browse access individually for each database is required.

Product Requirements

Budget

As the IIPC26 were willing to provide some funds for a solution to the problem outlined in Scenario 3 and they required a solution which was freely available, it was necessary to build a solution based on existing open-source initiatives. Our budget constraint was that we needed to achieve a solution using 6 person-months of development effort.

Product Research

An internet scan was conducted, searching for toolkits or research projects that we could make use of. We searched software directories in Yahoo and Google, tool directories on XML web sites such as xml.com, and software foundries such as SourceForge and the Apache Foundation web sites.

NLA Application Generator for NLA Framework

An alternative we considered was our own Application Generator used for prototyping our own web applications. This generates java servlet code, java data mapping classes, SQL code for creating tables and querying, WebMacro8 user interface templates, and ANT configuration files for deployment. Although this product provided us with a useful boot-strapping process for the development of new online systems, it was felt that the generated application was not standard enough to be easily customised by developers not familiar with our framework which although similar to Struts, was developed before Struts existed. If rewritten as an XML application it would be more accessible to other developers wishing to customise either the generator or individual generated applications to specific requirements and would allow the generator to have a developer community adding new features.

Tamino X-Application Generator

Software AG's Tamino2 XML Server product comes with a free add-on application framework service called Tamino X-Application3. X-Application provides JSP12 tags to embed access to the Tamino XML Server in HTML pages and is implemented as generic, ready-to-run Java modules. it includes an X-Application Generator which generates an application based on a Tamino Schema, which is based on the W3C XSD15 standard with some modification and add-ons. It produces a very basic application which has search, browse and update functionality. The Generator has an easy to use wizard, and is a good way to quickly create a functioning application.

The X-Application JSP tag library provides enough functionality to enable developers to build useful applications. However the wizard is not configurable and relies on a developer to modify the generated code. Also X-Application works only with Tamino databases. As the requirement from the IIPC was for open-source solutions this product could not be used because it would require the purchase of a commercial XML database product. However for others who don't have this constraint this product is worth evaluating.

Cocoon

The eXist1 distribution comes packaged with Cocoon7 which is an XML-based web publishing framework. It allows one to build web applications using only XQuery17 scripts which are processed by either their XQueryGenerator or XQuery servlet, both also provided with the distribution. It was possible to build applications without writing any Java code.

We considered using this set of technologies for Xinq but decided against it for various reasons. In the example applications provided with the distribution all application logic and display information was embedded in the XQuery scripts. This would have been appropriate only for small applications. And even so it would have been difficult to build XSL scripts to create XQuery scripts that embedded both display and business logic. We would be building an application based on external XQuery generators and the Cocoon framework, which are suitable to use if developing applications instances using these technologies, but not for a prototyping tool. Building on these components introduced extra dependencies which would have added to complexity and risk.

eXist

We chose eXist1 as the XML database server product as the back-end for the web application because it supports XQuery17, XSLT16 and the XML:DB19 API. eXist is deployable using either the Apache Tomcat5 or Jetty4 application servers. We found no other projects which were suitable for our purposes and decided to use the concepts already developed in our earlier application generator tool as the basis for the development of the XML schema for the configuration file and the Use Cases for the generic user interface and build our own generator using XSLT and a limited amount of Java.

Xinq Solution

Xinq was required to generate an interface for an arbitrary data collection. The complexity of this application lies with the fact that code is not being developed for any particular data structure or search, browse and display requirements. These are both arbitrary and they are of varying complexity. Xinq must be flexible enough to deal with the all the various configurations that are possible within the constraints of the application specification schema.

The most critical decision made was to allow the expression of multi-object data models in the data model specification. We considered the option of accommodating only single-object data models as this is far easier to implement and there are many products that already do this well. The strategy used for dealing with relational databases in this case is to flatten the data before ingestion into the XML data repository so that there is only one top-level XML element corresponding to the main table in the database and related tables have their contents replicated as sub-elements within the XML hierarchy. This can introduce a large amount of redundant data in the XML database when the data model represents related but independent items in a many-to-many relationship and compromises the efficiency of querying and updating the XML repository for any independent items which are not the main item, because of their replication. So we decided that the data model specification file should support multi-object data models, and the relationships between various objects, just like in a relational database, using keys in the XML Schema to express and validate relationships. This increased the complexity of our task tenfold but was considered a worthwhile investment as the generated the application would be much closer to what may be required for real-world systems and more easily enhanced to meet additional business requirements and workflows.

Even though the applications created by the Xinq tool may not be overly complex, developing the tool posed challenges because components of the system must be developed in a generic way, and the components that are application specific must be generated from the specification file. It was an issue of striking an appropriate balance of what should be a generic component, which took extra time, and what to leave to as specific, which added to the complexity of the generation process, to make the application as simple as possible overall.

Example

In order to visualise the requirements, an example database archive has been chosen for demonstration purposes. This is the Health Education Rural Resources Database25 published by the Faculty of Medicine Nursing & Health Sciences, Monash University. The publishers provided us with a copy of their database. There are two item types in this archive, Resources and Providers, with a many-to-many relationship between them. The full XML description of this example, including search, browse and display rules, is listed separately29. The XML Schema that was generated from this specification using Xinq also appended30. The data was extracted and transformed into XML according to the XML schema generated.

The resulting web application was completely automatically generated and deployed and is available online31.

Sample screen shots also been saved on a single web page for printing32.

Authoring the application specification file

Because of the complexity of the schema for this specification file and the requirement that business analysts should be able to generate this specification, it was decided that a wizard-style tool was necessary to make this possible.

A demonstration of this wizard is the best way to convey Xinq's capabilities.

Xinq Architecture

Each application instance has a unique user interface depending on the data structure of the archive and the specification of search, browse and display rules which are also included in the data model specification file. Information about items (entities) and properties (fields) that are being queried need to be passed through with the search parameters whenever a search request is made, or initialised when the servlet is first run which is used for information that remains unchanged for the life of the application.

As the back-end is generic, no reference is made to any particular archive in the code itself. All archive specific information is either embedded into hidden form fields, initialised at start-up, or written to a properties files.

An alternative to using Java as the backend would have been to use XQuery17 scripts. As XQuery is a programming language these scripts can hold application logic and can be processed by XQuery Generators like those available with eXist or Apache Cocoon7. This was dismissed as an option because the complexity of generating Java code wasn't mitigated by generating XQuery scripts, these scripts still contained application logic and would have been as difficult to generate as Java code. Also, we would be relying on the functionality of a third-party XQuery Generator component.

Database Selection

We have trialled Software AG's Tamino2 and the open-source eXist native-XML database server products, as both supported the required standards. Both performed adequately if not as fast as relational databases.

We researched other XML database products but they didn't support both the XQuery and the XML:DB19 API standards.

Online Update

Scenarios

The web site manager for a cultural institution would like to streamline the maintenance of the News and Events section of the web site which is currently maintained as a static set of web pages. When articles are written they would like to be able to be able to categorise them by type (eg media release), topic, author, date etc and make use of them on the News and Events site as well as in other newsletters targeted for different audiences and produce print and email newsletters as well from the same content. They would also like to be able to refer to them from other types of article eg media releases. Many of the articles describe items in their collection or in their online shop catalogue and they would like to share these with those catalogues as annotations. They would like to record all events in a structured way so that the Events can be searched and browsed using different navigation paths, eg by month, type, venue, eg and articles can link to events and vice versa. This event data can then be re-used for other purposes eg printed event calendars.

A helpdesk/customer service operation deals with a large number of phone and email requests for information. There is quite a lot of similarity in the answers given and they have already tried to provide more self-service options by manually producing and maintaining a Frequently Asked Questions web page. However, time-pressures have meant that it is not being actively maintained and the contents are also getting out-of-date. They would like to replace direct emails with an email form so that both questions and answers are stored in a database which can be made searchable. They would also like to categorise each request and flag particular question and answer combinations for automatic publication as FAQ, and to be able to review and edit.

Product Requirements

Product Research

As the requirement for a configurable web-based editing interface for XML content is arguably universal, and there was no requirement for this facility to be based on non-commercial products we expected to have a number of possible solutions to choose from. The product sectors we researched were mostly Web-based XML authoring software, database products with configuration options for building web-based update interfaces, and web application development tools that also contain code-free configuration options for quick proto-typing of web-enabled databases.

Content Management Systems are designed to set up templates for creating and maintaining web pages. They let you generate content dynamically by creating re-usable XML or HTML fragments, or importing data from a database system. The standard templates and configuration options of a CMS don't extend to complex structured data. Metadata editing options are focused on resource descriptions using a format like Dublin Core and administrative metadata. Process management is focused on publishing to the web.

XML Authoring Solutions - Altova Authentic XML Content Editor

The Authentic10 XML Content editor from Altova, when used in conjunction with their Stylevision11 product, provides a web form interface for XML content creation and editing. This content editor is free although it does require the download of a browser plug-in.

The Stylevision11 application is used to create StyleVision Power Stylesheet (SPS) files that are interpreted by Authentic to render a user interface for editing XML documents, allowing a user with no XML knowledge to create and edit documents. For each editing interface a person with knowledge of the StyleVision product is needed to create the SPS file.

Only relational databases are supported for direct editing in the Authentic view, and it does not allow direct editing of XML databases. As we are storing our XML content in native-XML databases the mapping process and overhead is undesirable.

This was not a suitable solution for our requirements as it does not generate an end-to-end solution from the content editor to the XML repository, and there is no configurable user interface generator.

Tamino2 Database with configurable web update interface

Tamino's X-Application3 product generates a search and update web application based on a Tamino Schema Definition (TSD) file. The TSD schema is subset of the XSD standard with Tamino specific extensions. The TSD specification doesn’t support the key and keyref constructs. Consequently the TSD language can still be used to specify multiple entities within a document but the relationships between these entities can’t be enforced.

The update interface of the generated application also does not properly support the creation or editing of fields that refer to other entities. The generated form does not create a mechanism that allows for the selection of an instance of a referred entity.

Web Application Development Frameworks

There are a number of sophisticated software development tools available now that improve the speed with which web applications can be developed using stable, open and flexible frameworks. However they are all still developer-oriented tools and required developer time to build even the simplest prototype. Although one of these frameworks will usually be required to build the final system with all the necessary business logic and user interfaces to meet business requirements - the value of the prototype generator is that the business specialist can define their business objects and a working prototype can be generated without developer assistance.

Xedit Solution

Although we have not ruled out purchasing a better solution in the future, we estimated that the extension of the already developed Xinq architecture to support online edit via the XML Xupdate standard would not be all that costly in development time and provide a lightweight alternative solution in the medium term. We also felt a continued obligation to support the open source community as we have gained so much financial benefit over the years from the availability of quality and well-supported open source solutions. In particular the Xinq and Xedit tools provide an application layer for the eXist XML database open source project, increasing its potential value to the community. Xedit is currently under development and the schedule for the first release is July 2005 .

Xedit Architecture

Xedit also generates a Java servlet-based application using the Spring java/j2ee application framework. It is a stand-alone application and is separate to the Xinq generated application. However from the user perspective Xinq and Xedit will have generated one integrated application.

Separating the online update application from the search and browse only application simplifies the code generation process. It also modularises application functionality - if there is no need for update capability then data is more secure as it is accessed through an application with view-only capabilities.

XUpdate18 requests are being sent to the XML database using the XML-RPC protocol. eXist supports SOAP and XML-RPC20. As more XML databases begin to support XUpdate we will consider supporting more protocols to ensure Xedit works with as many databases as possible.

The user interfaces for the add and edit functions are generated by applying XSLT16 transformations to XML element instances. These XSLT transformations are themselves generated using XSLT transforms of the application specification file. This is complicated but necessary as generating a user interface directly from the application specification file works for generating user interfaces for add item functions but not for editing existing items as the XSL processor and template have no knowledge of the content relating to the XML element instance.

Conclusion

Due to the evolutionary nature of the business requirements which justified the research and development costs, the decisions made and strategies chosen reflect the technological choices available at that time and what we already had in place locally.

The strongest business driver for the development of the Xinq tool were the requirements of the IIPC26 for a public domain tool to provide access to archived databases. However we perceived that other business requirements could also be satisfied by this development - in particular our requirement for the fast delivery of prototypes for new systems, based on the business object models arising from the requirements analysis process. This is the main business imperative for the further development if Xinq into the Xedit tool.

For other organisations and companies with similar needs, the choice of a solution is likely to be different, particularly as integration with existing web application infrastructure is so important. We hope that the sharing of our experiences will be helpful to you in your considerations.

For some though, the Xinq and Xedit tools may be the starting point you choose and we welcome your evaluations and recommendations for improvements. As software development is not the business of the National Library of Australia, and occurs only when financially justified, we hope to build a community of users for Xinq/Xedit who will become collaborators and continually improve its capability. Please register your interest via SourceForge27.

References

Software products and frameworks

  1. eXist - Open Source XML Database Server - http://exist.sourceforge.net/index.html
  2. Tamino - Commercial XML Database Server from Software AG - http://www1.softwareag.com/Corporate/products/tamino/default.asp
  3. Tamino X-Application - Application Framework Service - http://www1.softwareag.com/Corporate/products/tamino/prod_info/main_comp/es_tam_xapp.asp
  4. Jetty - Java http server and servlet container - http://jetty.mortbay.org/jetty/index.html
  5. Apache Tomcat - Java servlet container for the Apache HTTP Server - http://jakarta.apache.org/tomcat/index.html
  6. Apache Ant - java-based build tool used for automating deployment - http://ant.apache.org/
  7. Apache Cocoon - XML component-based web development framework - http://cocoon.apache.org/
  8. WebMacro - Java user interface template language (alternative to Java Server Pages - JSP) - http://www.webmacro.org/
  9. Spring - Java/J2EE framework- http://www.springframework.org/
  10. Authentic - Free XML Content editor, Windows desktop stand-alone program or Browser Plug-In from Altova - http://www.altova.com/products_doc.html
  11. Stylevision -Stylesheet designer for styling input forms for use by Authentic10 content editor and for styling HTML, PDF and Word/RTF output - http://www.altova.com/products_xsl.html

Standards

  1. JSP - JavaServer Pages Technology - http://java.sun.com/products/jsp/
  2. Java Servlets - http://java.sun.com/products/servlet/index.jsp
  3. WAR - Web Application Archive file format for Java servlet application deployment - http://java.sun.com/webservices/docs/1.0/tutorial/doc/WebApp3.html
  4. XSD - W3C XML Schema Language - http://www.w3.org/XML/Schema
  5. XSLT - W3C XML Transformation Language - http://www.w3.org/TR/xslt
  6. XQuery - standard for querying Web documents - http://www.w3.org/XML/Query
  7. XUpdate - XML language for updating XML documents proposed by the XML-DB project - http://xmldb-org.sourceforge.net/xupdate/xupdate-wd.html
  8. XML-DB API - Application Programming Interface (API) for XML databases - http://xmldb-org.sourceforge.net/xapi/index.html
  9. SOAP & XML-RPC- Simple Object Access Protocol - http://xml.org/xml/resources_focus_soap.shtml

Referenced Services

  1. Australian Libraries Gateway - http://www.nla.gov.au/libraries
  2. Oral History Collections Directory - http://www.nla.gov.au/ohdir
  3. Indexes and Databases service - http://www.nla.gov.au/pathways/jnls/newsite
  4. Pandora (Australian Web Archive) - http://pandora.nla.gov.au
  5. Health Education Rural Resources Database - http://www.med.monash.edu.au/mrh/resources/herrd/
  6. International Internet Preservation Consortium (IIPC) - http://netpreserve.org
  7. SourceForge - Open Source Software Repository - http://sourceforge.net/index.php
  8. About the National Library of Australia

Example

  1. Sample Application Specification File - http://www.nla.gov.au/xinq/examples/herrd/herrd_archive-spec.xml
  2. Sample Generated Schema - http://www.nla.gov.au/xinq/examples/herrd/herrd_xsd-schema.xsd
  3. Sample Live Application - http://www-test.nla.gov.au/apps/xinq
  4. Screenshots from Sample Application - http://www.nla.gov.au/xinq/presentations/screenshots.html

Copyright

Monica Berko, David Wong , © 2005. The authors assign to Southern Cross University and other educational and non-profit institutions a non-exclusive licence to use this document for personal use and in courses of instruction provided that the article is used in full and this copyright statement is reproduced. The authors also grant a non-exclusive licence to Southern Cross University to publish this document in full on the World Wide Web and on CD-ROM and in printed form with the conference papers and for the document to be published on mirrors on the World Wide Web.