I am by no means a master with Ruby and am quite new to Scrubyt. I was just trying out some examples found on there wiki page. The example i was working on was getting the search results returned by Google when you search for 'ruby' and I had the idea of grabbing the URL of each result so I could go ahead and fetch that page as well. The problem is I don't know how to grab the URL appropriately. This is my following code:
require 'rubygems'
require 'scrubyt'
google_data = Scrubyt::Extractor.define do
fetch 'http://www.google.com/ncr'
fill_textfield 'q','ruby'
submit
link_title "//a[@class='l']", :write_text => true do
link_url
end
end
google_data.to_xml.write($stdout, 1);
The code prints out the XML data appropriately (name and link) but how do I retrieve the link without the <link_url>
tags that seems to get added to it (I tried to print out link_url and I noticed the tags are printed as well). Could I do something as simple as fetch link_url
or is there a way of extracting the text from the xml content held in link_url
?
This is some of the content that gets printed by the google_data.to_xml.write()
:
<root>
<link_title>
Ruby Programming Language
<link_url>http://ruby-lang.org/</link_url>
</link_title>
<link_title>
Download Ruby
<link_url>http://www.ruby-lang.org/en/downloads/</link_url>
</link_title>
<link_title>
Ruby - The Inspirational Weight Loss Journey on the Style Network ...
<link_url>http://www.mystyle.com/mystyle/shows/ruby/index.jsp</link_url>
</link_title>
<link_title>
Ruby (programming language) - Wikipedia, the free encyclopedia
<link_url>http://en.wikipedia.org/wiki/Ruby_(programming_language)</link_url>
</link_title>
</root>