views:

65

answers:

3

I have a blogspot exported xml file and it looks something like this:

<feed>
<entry>
<title> title </title>
<content type="html"> Content </content>
</entry>
<entry>
<title> title </title>
<content type="html"> Content </content>
</entry>
</feed>

How do I parse with Nokogiri and Xpath???

Here is what I have :

#!/usr/bin/env ruby

require 'rubygems'
require 'nokogiri'


 doc = Nokogiri::XML(File.open("blogspot.xml"))

 doc.xpath('//content[@type="html"]').each do |node|
  puts node.text
 end

but it's not giving me anything :/

any suggestions? :/

A: 

Your code works for me. There were some problems with certain version of Nokigiri.

I get:

 Content
 Content

I'm using nokogiri (1.4.1 x86-mswin32)

Nigel Thorne
thanks nigel - it turned out that i needed to be very very specific with my xpath expressions - or cull away at un needed attributes :D
meilas
A: 

turns out that i had to delete the attributes for feed

<feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'&gt;
meilas
A: 

I just stumbled on this question. The issue appears to be XML namespaces:

"turns out that i had to delete the attributes for feed"

<feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'&gt;

XML Namespaces complicate accessing nodes because they provide a way to separate similar tags. Read the "Namespaces" section of Searching an HTML / XML Document.

Nokogiri also has the remove_namespaces! method which is a sometimes-useful way of dealing with the problem but has some downsides too.

Greg