views:

49

answers:

2

Hi,

I am building a comparison shopping site that takes in multiple xml feeds and displays the best deals. I use PHP Simplexml and then sort them using php when the page loads. I use a library like this: http://www.developertutorials.com/blog/php/parallel-web-scraping-in-php-curl-multi-functions-375/ to process the feeds in parallel.

Our application has little database logic. We just need these feeds to be processed as quickly as possible using PHP. Its decently fast now but I'd like to obviously make it quicker. Also, I'm worried when we start getting traffic PHP will slow down dramatically.

We are using eaccelerator but I don't think that this functionality gets a real boost from this. I can't really use caching because we need the deals to be fresh when the page loads.

If you guys were designing a system like this, what would you do to get the best performance? How can we get PHP to process these xml feeds as quickly as possible.

Thanks!

+4  A: 

You're downloading the feeds at every page hit?

You should be using cron to dump them into a database - it'll be much faster.

Greg
Thanks for the advice. unfortunatley, we can't do that because we have 3000 different items (each with their own set of xml files) we are tracking. We want the listings to be completely fresh so in order to do so we would have to make up to the minute calls to each of these pages. That would be a nightmare to have 3000 different processes running every minute each with significant database interaction.
A: 

I like xml_parse_into_struct(), it is really much faster in comparison with "easier to use classes" like DOMDocument(). Take about only 2/100 time in this case.

And of course, as already suggested, you should also optimize it storing the processed data instead of doing everything again every time.

Havenard