tags:

views:

62

answers:

4

I need to aggregate RSS content from roughly 500 URL's, while I'm trying to get content from those URL's time out/memory exhausted error occurred(I am trying with SimplePie library).

Is there any method/idea to pull out content fast from bulk sources?

How do I get fresh contents every time?

<?php
require_once('include/simplepie.inc');    
$urlList = array('http://site1.com/index.rss',
'http://site1.com/index.rss',
'http://site2.com/index.rss',
'http://site3.com/index.rss',
'http://site500.com/index.rss',
);  
$feed = new SimplePie();  
$feed->set_feed_url($urlList);  
$feed->init();  
$feed->handle_content_type();  
?>

html portion

<?php  
foreach($feed->get_items() as $item):  
?>  
<div class="item">
<h2><a href="<?php echo $item->get_permalink(); ?>"><?php echo $item->get_title(); ?></a></h2>
<p><?php echo $item->get_description(); ?></p>
<p><small>Posted on <?php echo $item->get_date('j F Y | g:i a'); ?></small></p>
</div>
<?php endforeach; ?>
A: 

Increase your memory_limit = xxM in php.ini or use ini_set("memory_limit","xxM") where xx is the new memory limit.

Martin Wickman
What website do you know that uses more than 32MB of ram? more than lily its a problem with his code.
RobertPitt
He never said it was a website at all. I figured he was running it as a script (cronjob or something).
Martin Wickman
i increased memory_limit and execution time but it will take more than 2 minutes but no result
JKS
+2  A: 

I think you're doing it wrong. If you want to parse that many feeds, you cannot do it from a script that will be called via a webserver.

If you really want to do the polling, you will have to run that script thru say, cron and then 'save' the results to be served by another PHP script (which can be called by the HTTP server).

However, you will still have to deal with a lot of inherent limitation to polling : 99% of the time, you will have no new content, thus wasting your CPU, bandwidth and the ones of the servers you're polling. You will also have to deal with dead feeds, non-valid ones, rate limiting, etc...

Implement the PubSubHubbub protocol. It will help for the feeds who have implemented it, so that you just have to wait for the data that will be pushed to you.

For the other feeds, you can either do the polling yourself, like you did and try to find a way to avoid the individual errors (not valid XML, dead hosts... etc) or really on a service like Superfeedr (I created it).

Julien Genestoux
This is the correct answer - the script needs to be run from the command line or it will always (potentially) reach the max execution time.
David Caunt
thank you for your answer i done using cron
JKS
Dont forget to implement PubSubHubbub; it will make your life much much easier when you have a lot of feeds to poll =)
Julien Genestoux
+1  A: 

My experience with SimplePie is that it isn't very good or robust try:

http://uk2.php.net/manual/en/function.simplexml-import-dom.php

woodscreative
I've found it to be very reliable.
symcbean
Is it necessary to use 3rd party code when PHP has so many built-in useful functions for parsing the DOM?
woodscreative
+1  A: 

Is there any method/idea to pull out content fast from bulk sources?

trying to poll all 500 urls synchronously is going to put a lot of stress on the system. This can be mitigated by running the transfers in parallel (using the curl_multi_* functions - but the version of SimplePie I've got here doesn't use these for multiple transfers). Assuming the volume of requests for the composite feeds merits it, then the best solution would be to run a scheduler to download the feeds to your server when the current content is set to expire (applying a sensible minimum value) then compositing the feed from the stored data. Note that if you take this approach you'll need to implement some clever semaphores or use a DBMS to store the data - PHP's file locking sematics are not very sophisticated.

symcbean
Perhaps even streams and `stream_select()`
Fanis