I have a simple web crawler to request all the pages from a website's sitemap that I need to cache and index. After several requests, the website begins serving blank pages.
There is nothing in their robots.txt
except the link to their sitemap, so I assume I am not breaking their "rules". I have a descriptive header that links to exactly what my intentions are, and the only pages I crawl are from their sitemap.
The http status codes are all still OK, so I can only imagine they're preventing large numbers of http requests in a short period of time. What is considered a reasonable amount of delay between requests?
Are there any other considerations I've overlooked that could potentially cause this problem?