I just started thinking about creating/customizing a web crawler today, and know very little about web crawler/robot etiquette. A majority of the writings on etiquette I've found seem old and awkward, so I'd like to get some current (and practical) insights from the web developer community.
I want to use a crawler to walk over "the web" for a super simple purpose - "does the markup of site XYZ meet condition ABC?".
This raises a lot of questions for me, but I think the two main questions I need to get out of the way first are:
- It feels a little "iffy" from the get go -- is this sort of thing acceptable?
- What specific considerations should the crawler take to not upset people?