I'm trying to extract each a href link on an html page for evaluation w/ nokogiri and xpath | ansaurus

tags:

views:

62

answers:

1

+1 Q:

I'm trying to extract each a href link on an html page for evaluation w/ nokogiri and xpath

I'm trying to extract each a href link on an html page for evaluation w/ nokogiri and xpath. What I have so far seems to be pulling the page titles out only. I'm not interested in the link title, but rather just the URL that is being pointed to.

Here's what I have:

doc = Nokogiri::HTML(open("http://www.cnn.com"))
doc.xpath('//a').each do |node|
  puts node.text
end

Can anyone guide me on how to correct this so that I'm pulling the actual href instead of the text itself?

+1 A:

Your XPATH of //a is pulling back all elements. Which includes the text content. You can use @attrname to access attributes. For example

//a/@href

Will get you the href of every a in the document

ChrisCM 2010-08-04 10:17:14

It's working, thanks for clearing that up!!

paradoxic 2010-08-04 10:28:19

related questions

Best place to get Ruby on Vista up and running as dev environment

How can I encode xml files to xfdl (base64-gzip)?

What is the best way to learn Ruby?

Learning Ruby on Rails any good for Grails?

How to sell Python to a client/boss/person with lots of cash

How do I create a Class using the Singleton Design Pattern in Ruby?

How do I update Ruby Gems from behind a Proxy (ISA-NTLM)

Why Should I Learn Ruby?

How do I create a new Ruby on Rails application using MySQL instead of SQLite?

How do I rake tasks within a ruby script?

Ruby On Rails with Windows Vista - Best Setup?

Mapping values from two array in Ruby

Reverse DNS in Ruby?

Text Editor For Linux (Besides Vi)?

What is good forum software to add to an existing Rails application?

Calling Bash Commands From Ruby

How can I modify .xfdl files? (Update #1)

How do I use (n)curses in Ruby?

Open Source Ruby Projects

How do I fix 'Unprocessed view path found' error with ExceptionNotifier plugin in rails 2.1?

When to use lambda, when to use Proc.new?

Frequent SystemExit in Ruby when making HTTP calls

Implementation of "Remember me" in a Rails application.

.NET Migrations Engine

How do I add existing comments to RDoc in Ruby?