tags:

views:

32

answers:

2

IMDb has an individual RSS feed for every single movie that they have listed. I have a site that has a lot of pages associated with movies, and I stored an IMDB id with each one.

I wanted to show top 5 results from each RSS feed, for each individual movie. The feed looks like this: http://rss.imdb.com/title/tt1013743/news

As you can imagine, IMDB has over a million films indexed, with a large number of them actually active. Many update several times a day. Is there a way to have a live feed of the news, fetched from IMDB, without having my server physically fetch each RSS feed, for each movie, several times a day?

A: 

I think the short answer is no. Unless imdb itself provides such a feed, then something somewhere has to do the work of fetching each feed individually, in order to find the movies with the most recently updated news.

There is a overall site news feed but I really don't think this does what you want.

I suppose that theoretically you could use Yahoo Pipes to deliver a combined feed, then your server only has to fetch that single feed. However, you'd still need to plumb in every movie feed, or find some way to cycle through them (is the 'tt1013743' part of your rss uri example incremented for each new film?). Realistically I've no idea if Pipes could even manage this potentially enormous task. Your best bet may be to contact imdb and ask for a "Recently Updated" rss feed to be added.

fearoffours
A: 

You can store the content-length header information in your Database for each release. It is very unlikely that two releases will have the exact same byte length, and the worst thing that could happen is just to lose an update, but it's not a big problem. In this way you only need to send HEAD http requests which is very cheap. On the server side, you can store the generated cache files compressed (gzcompress) so as to ensure the lowest filesize possible. This way you also save the time of XML parsing the RSS feed.

In addition you can try YQL to only get the 5 most recent news from the feed. Also, make sure to use cURL for fetching the RSS because it is very flexible and accepts compressed input, so you can reduce your bandwidth usage and transfer time.

galambalazs