I decided to give Nokogiri a try, and copied the following program straight from http://nokogiri.rubyforge.org/nokogiri/Nokogiri.html (adding only the require 'rubygems'
and the I_KNOW_I_AM_USING_AN_OLD_AND_BUGGY_VERSION_OF_LIBXML2
constant):
require 'rubygems'
I_KNOW_I_AM_USING_AN_OLD_AND_BUGGY_VERSION_OF_LIBXML2 = 1
require 'nokogiri'
require 'open-uri'
# Get a Nokogiri::HTML:Document for the page we’re interested in...
doc = Nokogiri::HTML(open('http://www.google.com/search?q=tenderlove'))
# Do funky things with it using Nokogiri::XML::Node methods...
####
# Search for nodes by css
doc.css('h3.r a.l').each do |link|
puts link.content
end
It returned no results. But when I changed
doc = Nokogiri::HTML(open('http://www.google.com/search?q=tenderlove'))
to
doc = Nokogiri::HTML(open('http://www.google.com/search?q=tenderlove').read)
the program worked as expected. Notice that the only difference was the addition of the .read at the end of the line. I would never have figured this out by myself, because just about every bit of example code leaves off the .read. The one place that included it, ironically was a post by one of the Nokogiri developers (at http://tenderlovemaking.com/2008/11/18/underpant-free-excitement). Did something in the API change? What am I missing?
I'm using Nokogiri 1.3.2.
Thank you.