Building on Pascal MARTIN's response...
I use a combination of CURL and XPATH. Below is a function I use in one of my classes.
protected function _get_xpath($url) {
$refferer='http://www.whatever.com/';
$useragent='Googlebot/2.1 (http://www.googlebot.com/bot.html)';
// create curl resource
$ch = curl_init();
// set url
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
curl_setopt ($ch, CURLOPT_REFERER, $refferer);
curl_setopt($ch, CURLOPT_URL, $url);
//return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
// $output contains the output string
$output = curl_exec($ch);
//echo htmlentities($output);
if(curl_errno($ch)) {
echo 'Curl error: ' . curl_error($ch);
}
else {
$dom = new DOMDocument();
@$dom->loadHTML($output);
$this->xpath = new DOMXPath($dom);
$this->html = $output;
}
// close curl resource to free up system resources
curl_close($ch);
}
You can then parse the document structure using evaluate and extract the information you want
$resultDom = $this->xpath->evaluate("//span[@id='headerResults']/strong");
$this->results = $resultDom->item(0)->nodeValue;