views:

66

answers:

2

Hey everyone,

I am currently doing some XML parsing and I've chosen to use Hpricot because of it's ease of use and syntax, however I am running into some problems. I need to write a piece of XML data that I have found out to another file. However, when I do this the format is not preserved. For example, if the content should look like this:

<dict>
  <key>item1</key><value>12345</value>
  <key>item2</key><value>67890</value>
  <key>item3</key><value>23456</value>
</dict>

And assuming that there are many entries like this in the document. I am iterating through the 'dict' items by using

hpricot_element = Hpricot(xml_document_body)
f = File.new('some_new_file.xml')
(hpricot_element/:dict).each { |dict| f.write( dict.to_original_html ) }


After using the above code, I would expect that the output look like the following exactly like the XML shown above. However to my surprise, the output of the file looks more like this:

<dict>\n", "    <key>item1</key><value>12345</value>\n", "    <key>item2</key><value>67890</value>\n", "    <key>item3</key><value>23456</value\n", "  </dict>


I've tried splitting at the "\n" characters and writing to the file one line at a time, but that didn't seem to work either as it did not recognize the "\n" characters. Any help is greatly appreciated. It might be a very simple solution, but I am having troubling finding it. Thanks!

A: 
hpricot_element = Hpricot::XML(xml_document_body)

File.open('some_new_file.xml', 'w') {|f| f.write xml_document_body }

Don't use an an xml parser if you want the original xml to be written. It is unnecessary. You should still use one if you want to further process the data, though.

Also, for XML, you should be using Hpricot::XML instead of just Hpricot.

Adrian
The sad part is that, in my original question, it took a lot of XML parsing and searching to get to this piece of data; so the XML parser was necessary if I didn't want to spend an extraordinary amount of time 'rolling my own.' The XML files that I am working with are very nasty (almost entirely semantic-less) and very large, thus I have to extract sections of the file into piece files for later analysis. So, I need the XML parsing features but I also need the ability to write back out to a file.
John
I actually just kept my original code and changed the write to be f.write( dict.to_original_html.gsub('\n', "\n").gsub('", "', '') )And that worked perfectly, replacing the \n literals with actual line breaks and removing the unnecessary punctuation.
John
A: 

My solution was to just replace the literal '\n' characters with line breaks and remove the extra punctuation by simply adding two gsubs that looked like the following:

f.write( dict.to_original_html.gsub('\n', "\n").gsub('" ,"', '') )


I don't know why I didn't see this before. Like I said, it might be an easy answer that I wasn't seeing and that's exactly how it turned out. Thanks for all the answers!

John