ansaurus

Question

How to parse results from google blog search?

Answer 1

A:

Why not just use their search API which includes blog search?

John Conde 2010-04-16 15:12:25

how could I use this API to get exaclty what I'm needing?

Jooj 2010-04-16 15:34:48

Check out their documentation: http://code.google.com/apis/ajaxsearch/documentation/ It explains what you need to do and has examples as well.

John Conde 2010-04-16 16:13:07

Thanks for the documentation but it seems like this API doesn't return "total results number"!

Jooj 2010-04-17 09:46:15

Answer 2

+1 A:

Although you normally should not parse an HTML file with regexes, in this case you could make an exception (since the page in particular still uses <font>, the structure is broken anyway and an XML parser would not help much). This piece of code here assumes that you already have fetched the webpage and put it into the string variable $webpage_as_string:

preg_match('|Results.+?of +about +\<b\>([0-9,]+)\<\/b\> +for|', $webpage_as_string, $matches);

$matches[1] would contain the result as a string. You'd need to filter out the commas and parse it into a number... Of course, this code would break as soon as Google changes it's site template.

http://php.net/manual/en/function.preg-match.php contains more information on the function, the pattern manual is here: http://www.php.net/manual/en/reference.pcre.pattern.syntax.php

maligree 2010-04-16 15:15:29

got an error: preg_match() [function.preg-match]: Delimiter must not be alphanumeric or backslash

Jooj 2010-04-16 15:32:19

oops, sorry, forgot the delimiters on the regex pattern ... php was a while ago for me. fixed it, although I did not test it.

maligree 2010-04-17 15:51:37

thank you maligree :)

Jooj 2010-04-18 13:43:32

Answer 3

A:

if you have wget

$ wget -O- -q "http://blogsearch.google.com/blogsearch?hl=en&amp;ie=UTF-8&amp;q=a&amp;btnG=Search+Blogs" | awk -vRS="Browse Top Stories|Blog results" -vFS='about|for' '/Results/{gsub(/<b>|<\/b>/,"",$2);print $2}'
 2,493,517,127

ghostdog74 2010-04-17 00:15:41

ansaurus

tags:

views:

answers:

How to parse results from google blog search?

related questions