tags:

views:

233

answers:

4

I want to CURL to Google to see how many results it returns for a certain search.

I've tried this:

  $url = "http://www.google.com/search?q=".$strSearch."&hl=en&start=0&sa=N";
  $ch = curl_init();
  curl_setopt($ch, CURLOPT_HEADER, 0);
  curl_setopt($ch, CURLOPT_VERBOSE, 0);
  curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
  curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible;)");
  curl_setopt($ch, CURLOPT_URL, $url);
  curl_setopt($ch, CURLOPT_POST, true);
  $response = curl_exec($ch);
  curl_close($ch);

But it just returns a 405 Method Allowed google error.

Any ideas?

Thanks

+2  A: 

Use a GET request instead of a POST request. That is, get rid of

curl_setopt($ch, CURLOPT_POST, true);

Or even better, use their well defined search API instead of screen-scraping.

Matti Virkkunen
Duh! Of course! I think I'll stick to screen scraping though.
TheBounder
Why would you stick to screen scraping, which will be brittle to the page's UI changes, when there's a well-defined API available that has what you want?
Jason Hall
The API has limitations, such as only returning the first 30 results. Scrapping Google is a very common thing.
Justin Johnson
Common or not common, it's against the Google Terms of Service.
methode
+4  A: 

Use the Google Ajax API.

http://code.google.com/apis/ajaxsearch/

See this thread for how to get the number of results. While it refers to c# libraries, it might give you some pointers.

Simon Brown
+1  A: 

Scrapping Google is a very easy thing to do. However, if you don't require more than the first 30 results, then the search API is preferable (as others have suggested). Otherwise, here's some sample code. I've ripped this out of a couple of classes that I'm using so it might not be totally functional as is, but you should get the idea.

function queryToUrl($query, $start=null, $perPage=100, $country="US") {
    return "http://www.google.com/search?" . $this->_helpers->url->buildQuery(array(
        // Query
        "q"     => urlencode($query),
        // Country (geolocation presumably)
        "gl"    => $country,
        // Start offset
        "start" => $start,
        // Number of result to a page
        "num"   => $perPage
    ), true);
}

// Find first 100 result for "pizza" in Canada
$ch = curl_init(queryToUrl("pizza", 0, 100, "CA"));
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_USERAGENT,      $this->getUserAgent(/*$proxyIp*/));
curl_setopt($ch, CURLOPT_MAXREDIRS,      4);
curl_setopt($ch, CURLOPT_TIMEOUT,        5);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);

$response = curl_exec($ch);

Note: $this->_helpers->url->buildQuery() is identical to http_build_query except that it will drop empty parameters.

Justin Johnson
A: 

CURLOPT_CUSTOMREQUEST => ($post)? "POST" : "GET"

Jet