Those links actually have class=l
not class="l"
. By the way, to figure this put I added some logging to the method so that you can see the output at various stages and debug it. I searched for the string you were expecting to find and didn't find it, which is why your regex failed. So I looked for the right string you actually wanted and changed the regex accordingly. Debugging skills are handy.
require "open-uri"
url = "http://www.google.com/search?q=ruby"
source = open(url).read
puts "--- PAGE SOURCE ---"
puts source
links = source.scan(/<a.+?href="(.+?)".+?class=l/)
puts "--- FOUND THIS MANY LINKS ---"
puts links.size
puts "--- PRINTING LINKS ---"
links.each do |link|
puts "- #{link}"
end
I also improved your regex. You are looking for some text that starts with the opening of an a tag (<a
), then some characters of some sort that you dont care about (.+?
), an href attribute (href="
), the contents of the href attribute that you want to capture ((.+?)
), some spaces or other attributes (.+?
), and lastly the class attrubute (class=l
).
I have .+?
in three places there. the .
means any character, the +
means there must be one or more of the things right before it, and the ?
means that the .+
should try to match as short a string as possible.