tags:

views:

79

answers:

1

Hello,

I'm having trouble figuring out why I can't get keywords to parse properly through nokogiri. In the following example, I have the a href link text functionality working properly but cannot figure out how to pull the keywords.

This is the code I have thus far:

.....

doc = Nokogiri::HTML(open("http://www.cnn.com"))
doc.xpath('//a/@href').each do |node|
#doc.xpath("//meta[@name='Keywords']").each do |node|

puts node.text

....

This successfully renders all of the a href text in the page, but when I try to use it for keywords it doesn't show anything. I've tried several variations of this with no luck. I assume that the the ".text" callout after node is wrong, but I'm not sure.

My apologies for how rough this code is, I'm doing my best to learn here.

+1  A: 

You're correct, the problem is text. text returns the text between the opening tag and the closing tag. Since meta-tags are empty, this gives you the empty string. You want the value of the "content" attribute instead.

doc.xpath("//meta[@name='Keywords']/@content").each do |attr|
  puts attr.value
end

Since you know that there will be only one meta-tag with the name "keywords", you don't actually need to loop through the results, but can take the first item directly like this:

puts doc.xpath("//meta[@name='Keywords']/@content").first.value

Note however, that this will cause an error if there is no meta-tag with the name "content", so the first option might be preferable.

sepp2k
Thank you!! I've been looking for quite a while to see where to find those definitions. Can you tell me what documenation I should have been looking at?
paradoxic
While looking into problems like this I'm almost always in irb. This lets you explore the values you get back, figuring out what calls you need to make or what attributes you need to query.
Paul Rubel
Thank you Paul!
paradoxic