views:

744

answers:

6

I need to embed arbitrary (syntactically valid) XML documents within a wrapper XML document. The embedded documents are to be regarded as mere text, they do not need to be parseable when parsing the wrapper document.

I know about the "CDATA trick", but I can't use that if the inner XML document itself contains a CDATA segment, and I need to be able to embed any valid XML document. Any advice on accomplishing this--or working around the CDATA limitation--would be appreciated.

+3  A: 

You need to properly escape the text. You don't say what language you're using, but generally: you build a DOM, create a Text node that contains your "inner" XML, and then serialize that DOM. The serializer will handle escaping for you.

The key point here is use a serializer to produce your output. Don't simply write strings, because you're all but guaranteed to produce something that's not well-formed XML.

kdgregory
Twice I started to comment that while this answer was good, it didn't fit what I was doing, which, way oversimplified, was receiving a text stream that contained an XML document that had to be wrapped within XML and sent back out. Parsing to a DOM wasn't part of the task. But I aborted my comments. This answer just kept weighing on me, and I finally had an epiphany: while input was time critical, output was not. So spin up a thread to buffer the XML, parse it, wrap it, and serialize it. Done!
Marc C
+2  A: 

When you escape the ending angular bracket of the inner CDATA, most XML parsers will not complain about the well-formedness of your XML. Using this "workaround", you should be able to nest multiple CDATA sections.

Something like:

<?xml version="1.0"?>
<SomeData>
<![CDATA[
<SomeMoreData>
<![CDATA[
yeah, this trick rocks! ...
]]&gt;
</SomeMoreData>
]]>
</SomeData>

Note that the inner CDATA has its ending ">" escaped as &gt;.

Cerebrus
A: 

Isn't that what character entities are for?

Calvin
A: 

Consider using XInclude instead of trying to embed an XML document inside another. The XInclude parse="text" attribute will force the XML to be treated as text, not markup.

Dour High Arch
+1  A: 

One easy solution is that you can have adjacent CDATA sections. <![CDATA[A]]><![CDATA[B]]> is the same as <![CDATA[AB]]>. Hence, you can have <![CDATA[]]]]><![CDATA[>]]>, a ]]> close tag split over two CDATA sections.

MSalters
A: 

You can do this by simply adding the document (without its <?xml declaration) as a child tom some parent. SOAP is doing this - it has a <Body> element that can contain whatever xml message one wants to send.

SOAP defines the XSD this way:

<xs:element name="Body" type="tns:Body" />
  <xs:complexType name="Body">
    <xs:sequence>
      <xs:any namespace="##any" minOccurs="0" 
          maxOccurs="unbounded" processContents="lax" />
    </xs:sequence>
    <xs:anyAttribute namespace="##any" processContents="lax">
    </xs:anyAttribute>
  </xs:complexType>
Bozho