views:

428

answers:

1

I'm using an XmlReader.ReadInnerXML to read an XML document (as text) embedded within in an element of an outer XML document. This works fine except for the handling of tab characters in attributes of the inner XML. Example:

<document>
  <interface>
    <scriptaction script="&#x9;one tab&#xD;&#xA;&#x9;&#x9;two tabs&#xD;&#xA;&#x9;&#x9;&#x9;three tabs" />
  </interface>
</document>

When ReadInnerXML is used at the "document" element level, the resulting string looks like this:

<interface><scriptaction script=" one tab&#xD;&#xA;  two tabs&#xD;&#xA;   three tabs"/></interface>

IOW, the tabs are turned into actual tab characters. Then when we then parse the resulting inner document, the tabs are normalized into spaces in the usual whitespace handling fashon, and the result is the conversion of tab characters to spaces. We need to preserve the attribute values as they are.

We've tried messing with various XmlReader settings to no avail. Is this possibly a defect in the reader, or something we're doing wrong?

Thanks in advance,

-- Nathan Allan - Database Consulting Group

A: 

I'm afraid this behaviour is required by the XML spec: http://www.w3.org/TR/REC-xml/#AVNormalize

Do you control the XML generation? Can you use a CDATA section instead?

Wayne
I think you are correct, though I would count this as high among the many shortcomings of XML. I do have control over new document generation, but will have to put measures in for backwards compatibility with older documents. Anyway, thanks!
N8allan
If you control the generation, CDATA (or even preferring an element over an attribute for this information) is what you want to make sure the content gets past an XML parser unmolested.I suspect that the attribute value normalization rules are there to ensure that compliant parsers all behave the same way with the equivalent content so that there are no platform-specific surprises when dealing with whitespace.
Wayne