views:

47

answers:

4

Hi!

In my recently project i work with multiple rss feeds. I want to list only the latest post from all of them, and sort them by timestamps.

My issue is that i have about 20 different feeds and the page take 6 seconds to load (only testing with 10 feeds).

What can i do to make it perfrom better?

I use simplexml:

simplexml_load_file($url);

Which i append to an array:

function appendToArray($key, $value){
$this->array[$key] = $value;
}

Just before showing it i make krsort:

krsort($this->array);

Any ideas? Should i cache it somehow?

A: 

Have you done any debugging? Logging microtime at various points in your code.

You'll find that it's the loading of the RSS feed, rather than parsing it, that takes the time but you might find that this is due to the time each RSS feed takes to generate.

Save those ten feeds as static xml files, point your script at them and see how fast it takes to load.

adam
That was exactly my thought, but then i need some cronscript to load every 30 seconds saving the xml files. The point is that I will show users the latest entries as they are added.
designer
Sometimes you have to make concessions to fit the resources available
adam
Thanks for your input. I will go with the static xml files.
designer
+1  A: 

You could cache them, but you would still have the problem of the page taking ages to load if caches have expired.

You could have a PHP script which runs in the background (e.g. via a cron job) and periodically downloads the feeds you are subscribed to into a database, then you can do much faster fetching/filtering of the data when you want to display it.

Tom Haigh
It seems to be the best idea, also see my comment to adam in his answer :) but still thanks!
designer
A: 

You can load the RSS feeds in parallel with curl_multi. That could speed up your script, especially if you're using blocking calls at the moment.

A small example (from http://www.rustyrazorblade.com/2008/02/curl_multi_exec/) :

$nodes = array('http://www.google.com', 'http://www.microsoft.com', 'http://www.rustyrazorblade.com');
$node_count = count($nodes);

$curl_arr = array();
$master = curl_multi_init();

for($i = 0; $i < $node_count; $i++)
{
    $url =$nodes[$i];
    $curl_arr[$i] = curl_init($url);
    curl_setopt($curl_arr[$i], CURLOPT_RETURNTRANSFER, true);
    curl_multi_add_handle($master, $curl_arr[$i]);
}

do {
    curl_multi_exec($master,$running);
} while($running > 0);

echo "results: ";
for($i = 0; $i < $node_count; $i++)
{
    $results = curl_multi_getcontent  ( $curl_arr[$i]  );
    echo( $i . "\n" . $results . "\n");
}
echo 'done';

More info can be found at Asynchronous/parallel HTTP requests using PHP multi_curl and How to use curl_multi() without blocking (amongst others).

BTW To process the feeds after they are loaded using curl_multi you will have to use simplexml_load_string instead of simplexml_load_file of course.

wimvds
Good point, but do not you think it would be better to save the files as the others suggest?
designer
If you cache them, then you will have cache misses once in a while, if you don't cache then your script will always hit the remote servers (and you'll always have a - be it small - delay). Personally, I would fetch them in parallel and cache them for a limited time (which depends on the frequency of the updates of the feeds and the number of users your site has, just see what works for you). The caching period could also be different for every feed if needed (ie. if you know some have more regular updates then others).
wimvds
BTW Caching by itself will not solve your problem, as @Tom Haigh already pointed out.
wimvds
A: 

yes of course caching is the only sensible solution.
better to set up a cron job to retrieve these feeds and store the data locally.

Col. Shrapnel