Greetings everyone:
I would love to get some infomation from a huge collection of Google Search Result pages. The only thing I need is the urls inside a bunch of html tags.
I cannot get a solution in any other proper way to handle this problem so now I am moving to ruby.
This is so far what I have written:
require 'net/http'
require 'uri'
url=URI.parse('http://www.google.com.au')
res= Net::HTTP.start(url.host, url.port){|http|
http.get('/#hl=en&q=helloworld')}
puts res.body
Unfortunately I cannot use the recommended hpricot ruby gem (because it misses a make command or something?)
So I would like to stick with this approach.
Now that I can get the response body as a string, the only thing I need is to retrive whatever is inside the ciite(remove an i to see the true name :)) html tags.
How should I do that? using regular expression? Can anyone give me an example?
Many thanks in advance!!!