The Wayback Machine - https://web.archive.org/web/20080603171020/http://dbpubs.stanford.edu:8090/pub/1998-51

Pagewise preview ]

CategoryValue
Available viahttp://dbpubs.stanford.edu/pub/1998-51
Submitted on 26th of February 2000
Author Cho, J.; Garcia-Molina, H.; Page, L.
Title Efficient Crawling Through URL Ordering
Date of publication 1998
Citation J. Cho,H. Garcia-Molina,L. Page: Efficient Crawling Through URL Ordering. In Proceedings of 7th World Wide Web Conference
Language English
Project Digital Libraries
Type Conference or Journal Paper
Subject group Databases and the Web
Abstract In this paper we study in what order a crawler should visit the URLs it has seen, in order to obtain more "important" pages first. Obtaining important pages rapidly can be very useful when a crawler cannot visit the entire Web in a reasonable amount of time. We define several importance metrics, ordering schemes, and performance evaluation measures for this problem. We also experimentally evaluate the ordering schemes on the Stanford University Web. Our results show that a crawler with a good ordering scheme can obtain important pages significantly faster than one without.
Keywords crawling, crawler, URL ordering, archive
Fulltext source
  • Postscript (ps, ps.gz, ps.zip)
  • PDF (pdf, pdf.gz, pdf.zip)
  • Plain text (text, text.gz, text.zip)
  • Management of the document bypubs@db.stanford.edu

    Pagewise preview ]


    Stanford InfoLab Publication Server