tags:

views:

24

answers:

1

I'm trying to read an RSS field and add some metadata to each item in Ruby, outputting another valid RSS feed.

I'd like to do this in one pass without reading the entire RSS feed into memory for performance reasons, but I've been playing with libxml-ruby and the Reader object doesn't seem to be able to print out the current element it has just read, which makes it difficult to loop through an XML file and print out each element.

example:

<rdf:RDF>
  <item>
    <description>foo</description>
  </item>
</rdf:RDF>

should become

<rdf:RDF>
  <item>
    <metadata>(some metadata about this item)</metadata>
    <description>foo</description>
  </item>
</rdf:RDF>

I'm not tied to libxml-ruby, but Nokogiri is also built on libxml and seems to have the same limitations, and REXML seems too slow according to every article I've seen.

Any help would be much appreciated!

Otherwise, I guess it's time for regular expressions...

A: 

Just a start.

f=0
File.readlines("file").each do |line|
  f=0 if line[/<\/rdf/]
  f=1 if line[/<rdf:RDF/]
  if f==1 and line["description"]
    line="<meta ...>\n" + line
  end
  puts line
end
ghostdog74
I did decide to do something like that, with a loop and regular expressions. Seems like this is always going to be faster than loading it into an xml parser!
Tim S