views:

63

answers:

2

I have a very large RSS (which may be 1M), so when I read it, it takes alot of time.

If I set the number of items read, example: 4, I think this will not ensure that I read all the data which updated since last time I read it (and I will lose some items),

What can I do?

I am using Google AJAX Feed Api to read the RSS/Atom feed.

updated:

I am using Google AJAX Feed to handle the RSS, then I store the data in my database.

A: 

Edit, possible specific solution:

If accessing a limited set of items from a feed do speed up the Google Feed API access, then simply keep asking for the most recent items until you encounter an item you have seen before. Unless the feed has been re-ordered this will ensure all items have been seen (however, remember that feed items may be updated -- those changes would be lost).

If accessing a limited set of items does not have an performance benefit, then another approach, such as a server-side helper (or another feed accessor), needs to be considered.

General information (not specific to this question):

The feed server should correctly handle If-Modified-Since header. So, while it won't directly save the 1M+ download, you only need to perform the download if the feed has been modified.

Additionally, you can request just a Range of data from the server, if the server supports Range requests, and manually merge the data in. Even if the server doesn't support range requests, you can abort the download after you have sufficient to continue (using this approach will allow you to inspect inbound data and terminate at exactly the right time).

In either case, you are responsible for ensuring enough is read -- from there it may be easiest just to "fix up" the local XML and pass it to a normal feed processor.

And, neither of the above are possible to do in plain client JavaScript :-)

pst
+1  A: 

Gosh that would be definitely a whole archive. I know how difficult large XML files can be to parse!

Adil Butt
@Adil - This should be a comment to the OP, not an answer.
TheCloudlessSky