Artículo de revista
Crawling a Country: Better Strategies than BreadthFirst for Web Page Ordering
Fecha
2005Registro en:
WWW 2005 May 10–14, 2005, Chiba, Japan
Autor
Baeza Yates, Ricardo
Castillo Ocaranza, Carlos
Marín, Mauricio
Rodríguez, Andrea
Institución
Resumen
This article compares several page ordering strategies for
Web crawling under several metrics. The objective of these
strategies is to download the most \important" pages \early"
during the crawl. As the coverage of modern search engines
is small compared to the size of the Web, and it is impossi-
ble to index all of the Web for both theoretical and practical
reasons, it is relevant to index at least the most important
pages.
We use data from actual Web pages to build Web graphs
and execute a crawler simulator on those graphs. As the
Web is very dynamic, crawling simulation is the only way to
ensure that all the strategies considered are compared un-
der the same conditions. We propose several page ordering
strategies that are more e cient than breadth- rst search
and strategies based on partial Pagerank calculations.