SharpSpider: A Continuous, Parallel and Distributed Spider

Autor:

Search engines have become so indispensable that they rank second only to e-mail as the most popular online activity. To respond to queries in a timely fashion, search engines make use of large indices of word occurrences on Web pages to cross-reference... Viac o knihe

Produkt je dočasne nedostupný

53.42 €

bežná cena: 60.70 €

O knihe

Search engines have become so indispensable that they rank second only to e-mail as the most popular online activity. To respond to queries in a timely fashion, search engines make use of large indices of word occurrences on Web pages to cross-reference websites to keywords. Such indices are maintained by spiders, a special kind of computer program that browses the Web autonomously. However, due to a variety of technological limitations, a single spider has proven insufficient to maintain a search engine's index. Hence, in this book, we review several alternatives to split a spider's work into multiple processes, and define a methodology to preserve an up-to-date index of the Web. SharpSpider, our prototype spider, has been evaluated using the resources of PlanetLab, a globally distributed platform for developing and deploying planetary-scale services. Despite the utilisation of very modest equipment, we have performed large crawls of the Web, distributing the workload amongst various computers spread across different continents. The statistics derived from our research offer valuable insight into the nature of educational Web resources.

  • Vydavateľstvo: VDM Verlag
  • Formát: Paperback
  • Jazyk:
  • ISBN: 9783639148862

Generuje redakčný systém BUXUS CMS spoločnosti ui42.