DocBook Roundtripping

Steve Ball

Packaged Press

Table ofContents


AuthoringFor DocBook

Conversionfrom DocBook

Conversionto DocBook





DocBook is apopular XML-based format for single-source, electronic publishing. AuthoringDocBook documents requires the use of XML editing software, which the averageauthor will find difficult to use. Most authors are familiar with wordprocessing tools, such as Microsoft Word or Open Office. This poster describesa method for using a word processor as the authoring environment for DocBookXML documents, by roundtripping using XSL stylesheets.


DocBook [Walsh]is a popular format for electronic publishing and features an XMLrepresentation. Of the many advantages of using DocBook, single-sourcepublishing is arguably the most useful. A DocBook document may be publishedinto many different formats, such as HTML or PDF, without having to change thesource document.

However, as withall XML formats, editing DocBook documents can be difficult for the averageauthor. Most authors do not understand the principles of structured authoring,which means that the use of XML editing software is non-intuitive.Sophisticated authoring tools for XML are available, and most have support forDocBook, but they are expensive and so are not available to many authors.

Instead, mostauthors are familiar and comfortable with using a word processor. By far themost popular word processor available today is Microsoft Word [MSWord]. Otherword processors include Open Office [OO] and Apple’s Pages [Pages]. We proposea system that makes use of the commonly available word processor applicationsto edit DocBook documents. This goal is achieved by a process of“roundtripping”; transforming the DocBook document into a word processingdocument and back again without loss of information.

The latestversions of all of the word processing applications mentioned above have theability to save and read documents in some kind of XML format.

An obstacle tousing this approach is that word processing documents are flat structures,whereas DocBook documents generally have a deep, hierarchical structure. Theproposed system provides a mechanism for representing the structure of theDocBook document in the word processing document, both explicitly andimplicitly, so that the structures can be reproduced faithfully and also newstructures can easily be created by the author.

This systemmakes extensive use of styles in the word processing document to representDocBook structures for the purpose of editing the document’s content. Webelieve that any non-casual user of a word processor will have enough trainingand familiarity with the application to be able to apply styles to theirdocument in an appropriate fashion. For the roundtripping process to besuccessful, the author must be rigorous in their use of styles. However,mistakes can happen and so the system provides feedback in the form of warningand error messages.

DocBook is anextensive but reasonably straight-forward XML-based language. DocBook documentscan be transformed into a number of publishing output formats, such as HTML andPDF, using freely available XSL stylesheets and other related tools. However,setting up the toolchain to produce these outputs and managing the creation ofpublishing outputs can be difficult to achieve; this usually requires theskills of a knowledgeable software engineer. Packaged Press is a new Web-basedservice that makes this functionality available to all authors, no matter whattheir level of technical skill.

Authoring For DocBook

When authoring adocument most authors will first reach for their familiar, commonly availableword processor, such as Microsoft Word, Apple’s Pages or Open Office. However,a word processor is not very suitable for authoring structured documents, suchas DocBook. This is despite the fact that popular word processors are now XMLenabled. The fact is that the word processing model conflicts with thestructured editing model; the former creates documents with flat structure, thelatter documents with deep, hierarchical structure.

Instead offorcing authors to retrain themselves to use a structured editor, we believethat making use of a standard word processor and working with the normal wordprocessing model, instead of fighting against it, will be a more successfulapproach. Authors are already trained to styles in their documents (or shouldbe), so a set of styles has been designed for use by authors in their documentsthat map directly to DocBook constructs. These styles have been designed insuch a way that they are easy for authors to apply, but can be mapped toDocBook in a reliable fashion. Paragraph styles correspond to block-levelelements and character styles correspond to phrase-level elements. Some stylesmap directly to a DocBook element of the same name, for example “para”,“blockquote”, “note”, and so on. However, some styles map to an element in aparent-child relationship. These are denoted by a “-” in the style name, forexample “article-title”, “chapter-title”, “variablelist-term”,“blockquote-attribution”, “informalfigure-imagedata”, and so on.

The majorchallenge of the conversion system is to derive the DocBook document structurefrom the flat paragraph- and character-style structure of the word processingdocument. This is achieved by applying a number of heuristics, as follows:

·      A paragraph style corresponding to a “container” element, such as “sect1”,starts that element. All paragraphs that follow are then added to the contentof that element until a paragraph is found that starts another element at thesame level. This closes the previous element and starts another.

·      Sequential structures are coalesced into a single parent element. Forexample, “note” style paragraphs will create the structure:

 <note>   <para>...</para>   <para>...</para> </note> 

Conversion from DocBook

There are twoparts to the roundtripping system; conversion of a DocBook XML document to theword processor XML format and then the reverse. The conversion of a DocBook XMLdocument to WordML simply involves the application of a single XSL stylesheetdocbook.xsl.

To use this XSLstylesheet a parameter must be supplied: wordml.template. The template documentis a WordML document that defines all of the paragraph and character stylesused by the conversion system. Using a separate template document allows theformatting of the styles to be maintained in the word processing application.

The output ofthe XSL transformation may be opened directly by MS Office 2003.

Conversion to DocBook

Converting aWordML document into a DocBook document is a complicated process. This isbecause the structure of the DocBook document must be built up from theunstructured word processing data. Another complication is to use an XSLtransformation to achieve the conversion, but to avoid the use of processorextensions. A technique has been developed that allows this to occur [Ball2004]. The roundtripping system includes an implementation of this techniquethat processes the set of style names described above and produces a DocBookdocument as its result. Using XSLT v1.0, there are three XSL stylesheetsinvolved, configured as a pipeline as shown below:

Packaged Press

Theroundtripping system is part of the DocBook XSL stylesheet distribution,available at no charge from the DocBook SourceForge project. This systemprovides the core of the authoring environment, but by itself is notparticularly useful to an author. What is needed is a framework that integratesthe roundtripping system into an application that manages the authoringenvironment and drives the XSL transformations. Packaged Press ( is aWeb-based application providing an integrated authoring and publishingenvironment.

Packaged Pressprovides not only the roundtripping system for authoring, but also asingle-source publishing system for post-processing of DocBook documents.Packaged Press allows DocBook documents to be transformed into HTML web pages,for viewing content online, or PDF documents, for high-quality printing.Because XSL stylesheets are used to create output documents, it is possible tosetup the system to produce other document types.


The DocBookschema and related XSL stylesheets provide a very good basis upon which tobuild a sophisticated single-source publishing system. DocBook documents arerich with structure and allow the publisher to manage their information in manyuseful ways. However, applications for authoring DocBook documents (indeed, XMLdocuments in general) are not common and few authors are familiar with theconcepts and practice of editing structured documents. Instead, a system hasbeen developed that involves “roundtripping” documents through commonlyavailable, familiar word processing applications to allow non-expert authors tocreate and edit document content. This system requires that authors useparagraph and character styles that map to DocBook elements. Transformation ofdocuments to and from DocBook is performed using a number of XSL stylesheets.These XSL stylesheets are available as part of the DocBook XSL stylesheet distribution[DocBook-XSLT].

Publishing usingDocBook requires a “toolchain” that processes XML documents to producedifferent output formats. Packaged Press is a new service that integrates theroundtripping authoring system with a back-end toolchain to provide anintegrated single-source publishing solution.


Ball 2004

Steve Ball. Multi-level Non-uniform Grouping in Very Large Documents. AusWeb Conference, July2004, Gold Coast, Australia.


Norman Walsh. DocBook: The Definitive Guide.


Bob Stayton, et al. DocBook XSL Stylesheet Distribution.


Microsoft Word.


Open Office.


Apple iWork Pages.



Steve Ball, ©2005. The author assigns to Southern Cross University and other educational andnon-profit institutions a non-exclusive licence to use this document forpersonal use and in courses of instruction provided that the article is used infull and this copyright statement is reproduced. The author also grants anon-exclusive licence to Southern Cross University to publish this document infull on the World Wide Web and on CD-ROM and in printed form with theconference papers and for the document to be published on mirrors on the WorldWide Web.