tags:

views:

155

answers:

2

I'm trying to parse a Tomboy note that has a link to another note inside it. The XML comes out looking like this:

<?xml version="1.0" encoding="utf-8"?>
<note version="0.3" xmlns:link="http://beatniksoftware.com/tomboy/link" xmlns:size="http://beatniksoftware.com/tomboy/size" xmlns="http://beatniksoftware.com/tomboy"&gt;
  <title>Our IP Blocks</title>
  <text xml:space="preserve"><note-content version="0.1">Our IP Blocks

What's <link:internal>in use</link:internal>?</note-content></text>
  <last-change-date>2009-03-10T10:24:36.3730770-04:00</last-change-date>
  <last-metadata-change-date>2009-03-10T10:24:36.3730770-04:00</last-metadata-change-date>
  <create-date>2009-03-10T10:23:14.2936280-04:00</create-date>
  <cursor-position>92</cursor-position>
  <width>450</width>
  <height>289</height>
  <x>0</x>
  <y>27</y>
  <open-on-startup>False</open-on-startup>
</note>

I'm parsing this with XML::Simple, and it's pulling out the <link:internal /> node in to a separate object within perl.

EDIT: The resulting object (for the <text /> node looks like this. Note that 'link:internal' is a separate entity from 'content'.

'text' => {
  'xml:space' => 'preserve',
  'note-content' => {
    'version' => '0.1',
    'link:internal' => 'in use',
    'content' => [
        'Our IP Blocks
        What\'s ',
        '?'
    ]
  }
}

Is this a bug, or am I crazy? All of the validators suggest that this is valid XML, but I've never seen it with a tag nested inside text like this before.

If it is a bug, does anyone know of another XML module that will get this right?

+2  A: 

The above is entirely valid XML. You have an opening element followed by a text node followed by an opening element.

I'm guessing (perhaps) that the text you're parsing hasn't been properly escaped before inserting into the top-level node. e.g. perhaps it should be

What's &lt;link:internal&gt;in use&lt;/link:internal&gt;

That would then result in getting the text as one text node and the contents not being parsed (if I'm reading this correctly).

Brian Agnew
Unfortunately, it's not my option to do this. I'm only taking what I get from Tomboy's file structure.
gms8994
I think you'll have to take the contents of your top level node and reparse as text, then (regenerating and escaping as appropriate).
Brian Agnew
+5  A: 
bart
This is the right answer. You're expecting your parse tree to be ordered, like it is in the markup, but XML::Simple is flattening it into an object with 'fields' like "link:internal". Look at XML::Parser::Style::Tree for the representation you probably want.
Andy Ross