I'm writing a crawler for Ruby, and I want to honour the headers that the server sends out in order to make the crawl more efficient. Is there a straightforward way in Ruby of determining whether a page needs to be re-downloaded by the client? I know I need to consider at least these headers:
- Last Modified
- Etags
- Cache Control
- Expires
What's the definitive way of determining this - is it specified anywhere?