tags:

views:

171

answers:

5

I have a 'complex item' that is in XML, Then a 'workitem' (in xml) that contains lots of other info, and i would like this to contain a string that contains the complex item in xml.

for example:

<inouts name="ClaimType" type="complex" value="<xml string here>"/>

However, trying SAX and other java parsers I cannot get it to process this line, it doesn't like the < or the "'s in the string, I have tried escaping, and converting the " to '.

Is there anyway around this at all?? Or will I have to come up with another solution?

Thanks

+5  A: 

Possibly the easiest solution would be to use a CDATA section. You could convert your example to look like this:

<inouts name="ClaimType" type="complex">
  <![CDATA[
    <xml string here>
  ]]>
</inouts>

If you have more than one attribute you want to store complex strings for, you could use multiple child elements with different names:

<inouts name="ClaimType" type="complex">
  <value1>
    <![CDATA[
      <xml string here>
    ]]>
  </value1>
  <value2>
    <![CDATA[
      <xml string here>
    ]]>
  </value2>
</inouts>

Or multiple value elements with an identifying id:

<inouts name="ClaimType" type="complex">
  <value id="complexString1">
    <![CDATA[
      <xml string here>
    ]]>
  </value>
  <value id="complexString2">
    <![CDATA[
      <xml string here>
    ]]>
  </value>
</inouts>
Mike Houston
A: 

I'm not sure how it works for attributes, and if escaping (< as < and > as >) does not work, then I don't know.

If it were an inner tag: you could use the Xml Any mechanism (never used it myself) or declare it in a CDATA section.

Guðmundur Bjarni
encoding and escaping are different things
Simon
haha oops :) potato potato!
Guðmundur Bjarni
+5  A: 

I think you'll find that the XML you're dealing with won't parse with a lot of parsers since it's invalid. If you have control over the XML, you'll at a bare minimum need to escape the attribute so it's something like:

<inouts name="ClaimType" type="complex" value="&lt;xml string here&gt;" />

Then, once you've extracted the attribute you can possibly re-parse it to treat it as XML.

Alternatively, you can take one of the approaches above (using CDATA sections) with some re-factoring of your XML.

If you don't have control over your XML, you could try using the TagSoup library to parse it to see how you go. (Disclaimer: I've only used TagSoup for HTML, I have no idea how it'd go with non-HTML content)

(The tag soup site actually appears down ATM, but you should be able to find enough doco on the web, and downloads via the maven repository)

Martin
+2  A: 

CDATA section or escaping

NB There is a big difference between escaping and encoding, which some other posters have referred to. Be careful of confusing the two.

Simon
A: 

you are http://www.doingitwrong.com/

If inouts/@value really is tree-structured (i.e. XML) then it shouldn't be an attribute, it should be a child element:

<inout name="ClaimType" type="complex">
    <value>
        <some-arbitrary>
            <xml-stuff/>
        </some-arbitrary>
    </value>
</inout>

If it is not, in fact, guaranteed to be well-formed XML, but just sort of looks like it because you put some pointy brackets in it, then you should ask yourself if there isn't some better way to solve this problem. That failing, use <![CDATA[ as some have already suggested.

bendin