tags:

views:

36

answers:

2

Hello,

I am new to using CURL, and a novice PHP coder. I would like to take specific elements on a web page (that change via AJAX) and input them into a databse using CURL. As of now, I can write a text file of a web page using CURL, but I dont really know where to go next. Any help would really be appreciated!

A: 

You need to do what's called 'scraping'. Here's a little tutorial I found on Google: http://www.oooff.com/php-scripts/basic-php-scraped-data-parsing/basic-php-data-parsing

thomasfedb
A: 

This normally would be handled by scraping pages using cURL. If you're serially scraping a bunch of pages, I suggest using the curl_multi family of functions to GET them in parallel. If you're looking for specific parts of the pages, you could load the HTML document into a SimpleXMLElement and using xpath to query for specific data.

The only problem with this solution is that you say you need scrape AJAX content from the page. cURL only interacts with the server -- it can't trigger client-side JavaScript. Some AJAX applications have a server side equivalent of the AJAX content you're viewing (e.g. http://example.com#test might translate to http://example.com/test). If the site you're working with doesn't have this type of mapping, you could try to figure out the URLs from which the AJAX content is being loaded and scrape those URLs directly using cURL.

If you need more advanced client-side features, you should look into Selenium. If you google for "Selenium screen scrape" you should see some interesting results. I know there's a Selenium integration in PHPUnit that might be worth a look.

Here's another question that deals with screen scraping AJAX pages: http://stackoverflow.com/questions/260540/how-do-you-screen-scrape-ajax-pages

Michael