tags:

views:

68

answers:

2

Hello I am using LibXML to parse an rss feed and I am wondering if is possible to access the content within using dot syntax (or just as easy).

So if I have:

<post>
  <created_at>Sat Aug 09 05:38:12 +0000 2008</created_at> 
  <id>882281424</id> 
  <text>I so just thought the guy lighting the Olympic torch was falling when he began to run on the wall. Wow that would have been catastrophic.</text> 
  <source>web</source> 
  <truncated>false</truncated> 
  <in_reply_to_status_id></in_reply_to_status_id> 
  <in_reply_to_user_id></in_reply_to_user_id>
</post>

Could I access it like

text = post.text
+3  A: 

No. The simplest way is to use XPath. For example, to get a list of all 'text' nodes that are children of a 'post' node:

doc = parser.parse
text_node = doc.find('/post/text') #returns all children

Or to get the first (and this case only) such node:

doc = parser.parse
text_node = doc.find_first('/post/text') #returns first child only
Pesto
A: 

If you're prepared to do a little setup work, then you may find HappyMapper useful.

You declare a class and its mapping (or the parts in which you're interested at least) - in your case it would probably look something like this

class Post
  include HappyMapper
  element :text, String
end

and use it something like this:

posts = Post.parse(File.read(path_to_rss.xml))
posts.each do |post|
  puts post.text
end

All completely untested, I'm afraid...

Mike Woodhouse
Tested this and it works fine (with the typo fixed). If the XML only contains the `<post>...</post>` you don't need to loop over it, just do `puts posts.text`
dbr