AusWeb 03 Banner

Coping with the News: the machine learning way

Dr. Rafael A. Calvo, Web Engineering Group The University of Sydney, NSW, 2006. Email: rafa@ee.usyd.edu.au

Prof. JaeMoon Lee, , Hansung University, Seoul Korea, Web Engineering Group The University of Sydney, NSW, 2006, Email: jmlee@ee.usyd.edu.au


Keywords

Text Categorization, Machine Learning, XML, NewsML


Abstract

News articles represent some of the most popular and commonly accessed content on the web. This paper describes how machine learning and automatic document classification techniques can be used for managing large numbers of news articles. In this paper, we work with more than 800,000 of Reuters news stories and classify them using a Naive Bayes and k-Nearest Neighbours approach. The articles are stored in newsML format, commonly used for content syndication. The methodology developed would enable a web based routing system to automatically filter and deliver news to users based on an interest profile.


[ Full Paper ] [ Presentation ] [ Proceedings ] [ AusWeb Home Page ]



AusWeb 2003. The Ninth Australian World Wide Web Conference, Hyatt Sanctuary Cove, Gold Coast, from 5th to 9th July 2003 Contact: Norsearch Conference Services +61 2 66 20 3932 (from outside Australia) (02) 6620 3932 (from inside Australia) Fax (02) 6622 1954