I wish to return only the number of google search results for a particular keyword in the fastest manner possible, avoiding (keeping to minimum) the use of third party libraries. I have already considered xgoogle.
A:
You can use urllib for downloading the site and HTMLParser to parse out the
<div id="resultStats">....</div>
values. Here is an example:
zoli2k
2010-07-26 11:47:23
it's worth a mention that you'll have to spoof the browser agent id when using urllib - and Google frowns upon automated queries...
Wayne Werner
2010-07-26 12:27:37
Thanks, this answer also helps me solve something else I was stuck on.
subiet
2010-07-28 04:10:24
A:
Take a look at Alex Martelli's example.
If you search for something vague like "cars", data
will look something like the following. Notice that it isn't very long; you only get the top few hits, and a link to "moreResultsUrl". Therefore, it should be reasonably fast to make this query and look in
data['cursor']['estimatedResultCount']
for the estimated number of hits.
{'cursor': {'currentPageIndex': 0,
'estimatedResultCount': '168000000',
'moreResultsUrl': 'http://www.google.com/search?oe=utf8&ie=utf8&source=uds&start=0&hl=en&q=cars',
'pages': [{'label': 1, 'start': '0'},
{'label': 2, 'start': '4'},
{'label': 3, 'start': '8'},
{'label': 4, 'start': '12'},
{'label': 5, 'start': '16'},
{'label': 6, 'start': '20'},
{'label': 7, 'start': '24'},
{'label': 8, 'start': '28'}]},
'results': [ <<list of 4 dicts>> ]}
unutbu
2010-07-26 11:52:22