I would like to know what's the best way to fetch RSS feeds in real time without having to download the entire feed even when it hasn't been changed. I don't really mind the language, I'm just looking for the best way to do that.
+2
A:
You can use ETag
and If-Modified-Since
header HTTP header parameters.
Here is a sample python code:
etag = ... # etag of previous request
last_modifier = ... # time of last request
req = urllib2.Request(url)
if etag:
req.add_header("If-None-Match", etag)
if last_modified:
req.add_header("If-Modified-Since", last_modified)
opener = urllib2.build_opener(NotModifiedHandler())
url_handle = opener.open(req)
headers = url_handle.info()
if hasattr(url_handle, 'code') and url_handle.code == 304:
# no change happened
else:
# RSS Feed has changed
The code can be transferred to any language where you just add the necessary header tags and check the returned code.
UPDATE: Checkout this blog entry: HTTP Conditional GET for RSS Hackers
notnoop
2009-10-12 19:24:31