views:

230

answers:

3

Hello!

I'm trying to parse the number of results from google seach blog. Could somebody please help me!

http://blogsearch.google.com/blogsearch?hl=en&ie=UTF-8&q=a&btnG=Search+Blogs

returns a complete page. On the right side you can see (Results 1 - 10 of about 2,504,830,546 for a. (0.05 seconds) ).

How could I get 2,504,830,546???

Thanks. Regards.

A: 

Why not just use their search API which includes blog search?

John Conde
how could I use this API to get exaclty what I'm needing?
Jooj
Check out their documentation: http://code.google.com/apis/ajaxsearch/documentation/ It explains what you need to do and has examples as well.
John Conde
Thanks for the documentation but it seems like this API doesn't return "total results number"!
Jooj
+1  A: 

Although you normally should not parse an HTML file with regexes, in this case you could make an exception (since the page in particular still uses <font>, the structure is broken anyway and an XML parser would not help much). This piece of code here assumes that you already have fetched the webpage and put it into the string variable $webpage_as_string:

preg_match('|Results.+?of +about +\<b\>([0-9,]+)\<\/b\> +for|', $webpage_as_string, $matches);

$matches[1] would contain the result as a string. You'd need to filter out the commas and parse it into a number... Of course, this code would break as soon as Google changes it's site template.

http://php.net/manual/en/function.preg-match.php contains more information on the function, the pattern manual is here: http://www.php.net/manual/en/reference.pcre.pattern.syntax.php

maligree
got an error: preg_match() [function.preg-match]: Delimiter must not be alphanumeric or backslash
Jooj
oops, sorry, forgot the delimiters on the regex pattern ... php was a while ago for me. fixed it, although I did not test it.
maligree
thank you maligree :)
Jooj
A: 

if you have wget

$ wget -O- -q "http://blogsearch.google.com/blogsearch?hl=en&amp;ie=UTF-8&amp;q=a&amp;btnG=Search+Blogs" | awk -vRS="Browse Top Stories|Blog results" -vFS='about|for' '/Results/{gsub(/<b>|<\/b>/,"",$2);print $2}'
 2,493,517,127
ghostdog74