tags:

views:

36

answers:

1

I am hitting a lot of different sites to get a list of information and I want to display this information as I get it. Right now I am using a Smarty Template and what I would like to do is:

Pseudo code:

{foreach $page} 
  $smarty_var = use AJAX to call a PHP function
  Render out a new table row on the fly w/ the newly assigned var   
  <tr><td>{$smarty_var}</td></tr>
{/foreach}

I don't know much about AJAX, I used it a long time ago, and it was similar to this, but not quite, there was user action taken. No I don't have a JS Framework in place. Am I way off here on how this should go? Basically I want to display a table row as data comes available, each table row will be a request to get the data from another site.

Sure, I will tell you about what I am trying to do: http://bookscouter.com/prices.php?isbn=0132184745+&amp;x=19&amp;y=6 If you click on the 'Click to view prices from all 43 links' at the bottom on that page you will see. I am using cURL to get all the pages I want a price from. Then for each page I want to get the price. So each page is gonna fire off a function that runs some fun code like this:

function parseTBDOTpageNew($page, $isbn)
{
    $first_cut = preg_split('/<table[^>]*>/', $page);
    $second_cut = preg_split('/<td[^>]*>/', $first_cut[2]);
    if(strstr($second_cut[4], "not currently buying this book") == true)
    {
        return "\$0.00";
    }
    $third_cut = preg_split('/<b[^>]*>/', $second_cut[9]);
    $last_cut = preg_split('/</', $third_cut[3]);
    return $last_cut[0];
}

This function is called from another function which puts the price returned from the function above, the name of the company, and a link in an array to be added to another bigger array that is sent to smarty. Instead of doing that, I just want to get the first array that is returned with individual information and add the values into a table row on the fly.

I will take your advice on Jquery, what I have started is an onload function that receives the $pages to be parsed, and I was just in the middle of writing: foreach page get the info and spit some html w/ the info on the page.

Also the function that calls the function to get the price is in a php file, so I need the request to hit a function within a php file and NOT just call file.php?param1=foo, I need to it to actually hit the function in the file. I have Jquery in place, now just trying to figure it out and get it to do what I need, ugh. I am searching, any help would be appreciated.

+1  A: 

No I don't have a JS Framework in place

Fix that first. You don't want to juggle XMLHTTPRequests yourself. jQuery is SO's canonical JS library, and is pretty nifty.

Basically I want to display a table row as data comes available, each table row will be a request to get the data from another site.

How many rows will you be dealing with? Do they all have to be loaded asynchronously?

Let's tackle this in a braindead, straightforward way. Create a script that does nothing more than:

  1. Take a site ID and fetch data from the corresponding URL
  2. Render that data to some data transport format, either HTML or JSON.

Then it's a simple matter of making the page that the user gets, which will contain Javascript code that makes the ajax calls to the data fetcher, then either shoves the HTML in the page directly, or transforms the data into HTML and then shoves that into the page.

You'll note that at no point is Smarty really involved. ;)

This solution is highly impractical for anything more than a trivial number of sites to be polled asynchronously. If you need rows for dozens or hundreds of sites, that means each client is going to need to make dozens or hundreds of requests to your site for every single normal pageview. This is going to slaughter your server if more than one or two people load the page at once.

Can you tell us more about what you're doing, and what you're trying to accomplish? There are lots of ways to mitigate this problem, but they all depend on what you're doing.


Update for your question edit.

First, please consider using an actual HTML parser instead of regular expressions. The DOM is very powerful and you can target specific elements using XPath.

Instead of doing that, I just want to get the first array that is returned with individual information and add the values into a table row on the fly.

So, here's the ultimate problem. You want to do something asynchronously. PHP does not have a built-in generalized way to perform asynchronous tasks. There are a few ways to deal with this problem.

The first is as I've described above. Instead of doing any of the curl requests on page load, you farm the work out to the end user, and have the end user's browser make requests to your scraping script one by one, filling in the results.

The second is to use an asynchronous work queue, like Gearman. It has excellent PHP support via a PECL extension. You'd write one or more workers that can accept requests, and keep a pool of them running at all times. The larger the pool, the more things you can do at once. Once all of the data has returned, you can throw the complete set of data at your template engine, and call it good.

You can even combine the two, having the user make only one or two or three extra requests via ajax to fetch part of the returned data. Heck, you can even kick off the jobs in the background and return the page immediately, then request the results of the background jobs later via ajax.

Regardless of which way you handle it, you have a giant, huge problem. You're scraping someone's site. You may well be scraping someone's site very often. Not everyone is OK with this. You should seriously consider caching results aggressively, or even checking with each of the vendors to see if they have an API or data export that you can query against instead.

Charles
NOte the edit I just did.
KacieHouser
I want it to load as I get them, I am requesting over 40 sites, so if I don't go ahead and load the page and render them as I get them, it takes a good thirty or 40 seconds to load the page.
KacieHouser
+1 for jQuery :)
Michael Robinson
@PylonsN00b, I've updated my answer.
Charles
we are using API's as we get them, and I am caching the results, while waiting I have added jQuery and trying to do each one, one page at a time, being new to jQuery I am still messing with it. Basically onload I call a js function which in turn loads a php page that gets the data I need, returns it in html format, and then the js function is returning the html to the page, I am still jacking with it though, I'll probably make a separate post if I can't get it figured out.
KacieHouser
@PylonsN00b: What you're doing jQuery-wise sounds perfectly normal and correct, at least. Good luck!
Charles