AusWeb 03 Banner

Multi-level Non-uniform Grouping of Very Large Flat Structured Documents

Steve Ball, Zveno


Keywords

XML, structuring, grouping, documents


Abstract

Most major word processing applications, such as Microsoft Word and Open Office, now have the ability to save in XML format. However, it is often the case that the schema used by the word processor is not desirable for further processing or storage of the document. In this case it is necessary to convert the word processor's XML representation of the document into an application-defined schema. However, a word processing document is essentially an unstructured document: it is a long list of paragraphs. Hence the task of converting a word processing document into a structured document is a specialised type of grouping problem.

The XSLT language is a natural choice for converting a document from one XML-based schema into another XML-based schema. Grouping is a common problem when developing XSLT stylesheets and there are a number of well-known design patterns for achieving the desired structures in the result document. These techniques include positional grouping and Muenchian grouping. A description of each type of grouping technique is given, along with an explanation of how the grouping method works. While all of these methods do produce the correct result, in the case of the transformation of large word processing documents their runtime performance is unacceptable.

A new technique has been developed for handling documents that typically result from word processing applications. This technique uses modal template processing to achieve its results. It exhibits fast runtime performance using standard XSLT v1.0 features. A description of the technique given, along with an explanation of how high performance is gained.


[ Full Paper ] [Presentation] [ Proceedings ] [ AusWeb Home Page ]



AusWeb04. The Tenth Australian World Wide Web Conference, Seaworld Nara Resort, Gold Coast, from 3rd to 7th July 2004 Contact: Norsearch Conference Services +61 2 66 20 3932 (from outside Australia) (02) 6620 3932 (from inside Australia) Fax (02) 6626 9317