views:

62

answers:

1

I'm trying to extract each a href link on an html page for evaluation w/ nokogiri and xpath. What I have so far seems to be pulling the page titles out only. I'm not interested in the link title, but rather just the URL that is being pointed to.

Here's what I have:

doc = Nokogiri::HTML(open("http://www.cnn.com"))
doc.xpath('//a').each do |node|
  puts node.text
end

Can anyone guide me on how to correct this so that I'm pulling the actual href instead of the text itself?

+1  A: 

Your XPATH of //a is pulling back all elements. Which includes the text content. You can use @attrname to access attributes. For example

//a/@href

Will get you the href of every a in the document

ChrisCM
It's working, thanks for clearing that up!!
paradoxic