AusWeb
	02 Logo

UbiCrawler: A Scalable Fully Distributed Web Crawler

Paolo Boldi, Dipartimento di Scienze dell'Informazione, Università degli Studi di Milano, via Comelico 39/41, I-20135 Milano, Italy. Email: boldi@dsi.unimi.it

Bruno Codenotti, Istituto di Informatica e Telematica, Consiglio Nazionale delle Ricerche, Via Moruzzi 1, I-56010 Pisa, Italy. Email: codenotti@imc.pi.cnr.it

Massimo Santini, Dipartimento di Scienze Sociali, Cognitive e Quantitative, Università di Modena e Reggio Emilia, via Fratelli Manfredi I-42100 Reggio Emilia, Italy. Email: msantini@unimo.it

Sebastiano Vigna, Dipartimento di Scienze dell'Informazione, Università degli Studi di Milano, via Comelico 39/41, I-20135 Milano, Italy. Email: vigna@acm.org


Keywords

Web algorithmics, web searching, distributed algorithms, fault tolerance.


Abstract

We present the design and implementation of UbiCrawler, a scalable distributed web crawler, and we analyze its performance. The main features of UbiCrawler are platform independence, fault tolerance, a very effective assignment function for partitioning the domain to crawl, and more in general the complete decentralization of every task.


[ Full Paper ] [ Presentation ] [ Proceedings ] [ AusWeb02 Home Page ]


AusWeb 2002, AusWeb 2002, The Eighth Australian World Wide Web Conference, held in Twin Waters Resort, Sunshine Coast, Queensland from July 6-10, 2002. Contact: Norsearch Conference Services +61 2 66 20 3932 (from outside Australia) (02) 6620 3932 (from inside Australia) Fax (02) 6622 1954