Dr. Rafael A. Calvo, Web Engineering Group The University of Sydney, NSW, 2006. Email: rafa@ee.usyd.edu.au
Prof. JaeMoon Lee, , Hansung University, Seoul Korea, Web Engineering Group The University of Sydney, NSW, 2006, Email: jmlee@ee.usyd.edu.au
Text Categorization, Machine Learning, XML, NewsML
News articles represent some of the most popular and commonly accessed content on the web. This paper describes how machine learning and automatic document classification techniques can be used for managing large numbers of news articles. In this paper, we work with more than 800,000 of Reuters news stories and classify them using a Naive Bayes and k-Nearest Neighbours approach. The articles are stored in newsML format, commonly used for content syndication. The methodology developed would enable a web based routing system to automatically filter and deliver news to users based on an interest profile.
[ Full Paper ] [ Presentation ] [ Proceedings ] [ AusWeb Home Page ]