AusWeb 02 Logo

Wrapping Web Pages into XML Documents: A Practical Experience and Comparison of Two Tools

Sabine Jabbour, Student at Monash University. Email: sabine@jabbour.net

Anne-Marie Vercoustre, Senior Researcher, CSIRO, Mathematical and Information Science, Email: Anne-Marie.Vercoustre@csiro.au


Keywords

Wrapper, Semi-structured, XML, Web Information


Abstract

The notion of wrapping a web server to produce XML documents from unstructed web pages is driven by the need to produce structured data that can be used by a variety of applications. The web contains vast amounts of information that cannot be used by most applications as it targets a human audience. A solution to this is to automate the browsing process and convert the unstructured extracted information into a more structured format such as XML. This is called wrapping. We have used two different tools to wrap several tourist sites into XML The tools we have used are Norfolk, a system developed by the CSIRO TED group and W4F, initially developed at the University of Pennsylvania and now a commercial product. This report describes our practical experience with the tools and compares them. The comparison highlights features required by a wrapper system to support real applications.


[ Full Paper ] [ Presentation ] [ Proceedings ] [ AusWeb01 Home Page ]


AusWeb 2002, AusWeb 2002, The Eighth Australian World Wide Web Conference, held in Twin Waters Resort, Sunshine Coast, Queensland from July 6-10, 2002. Contact: Norsearch Conference Services +61 2 66 20 3932 (from outside Australia) (02) 6620 3932 (from inside Australia) Fax (02) 6622 1954