Skip to content

crawl rate considerations

Alex Osborne edited this page Jul 4, 2018 · 2 revisions

Why isn't Heritrix crawling as fast as I expected?

We are often asked why Heritrix is crawling slower than expected, and the answer can usually be divided into the following considerations:

Politeness or resource optimization?

The important factor to consider is whether you are crawling a small number of sites or a large number (having many independent queues). In the case of the former, your politeness policy and/or coordination with site maintainers is your primary concern, for the latter, resource optimizations (like using more RAM or a different disk layout) may be of benefit.

Heritrix

Structured Guides:

Wiki index

FAQs

User Guide

Knowledge Base

Known Issues

Background Reading

Users of Heritrix

How To Crawl

Development

Clone this wiki locally