I setup Nutch with a db.fetch.interval.default of 60000 so that I can crawl every day. If I don't, it won't even look at my site when I crawl the next day. But when I do crawl the next day, every page that it fetched yesterday gets fetched with a 200 response code, indicating that it's not using the previous day's date in the "If-Modified-Since". Shouldn't it skip fetching pages that haven't changed? Is there a way to make it do that? I noticed a ProtocolStatus.NOT_MODIFIED in Fetcher.java, so I think it should be able to do this, shouldn't it?
By the way, this is cut and pasted from conf/nutch-default.xml from the current trunk:
<!-- web db properties -->
<property>
<name>db.default.fetch.interval</name>
<value>30</value>
<description>(DEPRECATED) The default number of days between re-fetches of a page.
</description>
</property>
<property>
<name>db.fetch.interval.default</name>
<value>2592000</value>
<description>The default number of seconds between re-fetches of a page (30 days).
</description>
</property>