views:

580

answers:

9

I need to embed an entire well-formed xml document within another xml document. However, I would rather avoid CDATA (personal distaste) and also I would like to avoid the parser that will receive the whole document from wasting time parsing the embedded xml. The embedded xml could be quite significant, and I would like the code that will receive the whole file to treat the embedded xml as arbitrary data.

The idea that immediately came to mind is to encode the embedded xml in base64, or to zip it. Does this sound ok?

I'm coding in C# by the way.

+1  A: 

I would encode it in your favorite way (e.g. base64 or HttpServerUtility::UrlEncode, ...) and then embed it.

henchman
+5  A: 

You could convert the XML to a byte array, then convert it to binary64 format. That will allow you to nest it in an element, and not have to use CDATA.

A: 

It seems that serialization is the recommended method.

Bauer
+2  A: 

The W3C-approved way of doing this is XInclude. There is an implementation for .Net at http://mvp-xml.sourceforge.net/xinclude/

Dour High Arch
The .NET Framework does not have an XInclude implementation.
John Saunders
The link I posted is to an implementation of XInclude for .Net in C#.
Dour High Arch
+3  A: 

Just a quick note, I have gone the base64 route and it works just fine but it does come with a stiff performance penalty, especially under heavy usage. We do this with document fragments upto 20MB and after base64 encoding they can take upwards of 65MB (with tags and data), even with zipping.

However, the bigger issue is that .NET base64 encoding can consume up-to 10x the memory when performing the encoding/decoding and can frequently cause OOM exceptions if done repeatedly and/or done on multiple threads.

Someone, on a similar question recommended ProtoBuf as an option, as well as Fast InfoSet as another option.

GrayWizardx
Since the data I want to embed is XML, its highly compressible, after some tests it seems that if I first compress the xml before converting to base64, the resulting size in bytes is about 10% less than the amount of data taken up by the raw pre-compression xml. I think I will take this route!
tempy
I think that you might want to try vtd-xml's C# port, many of the problems with DOM are solved nicely, such as performance (you seem to worry about parsing), and memory usage, but it is an external tool kit (not part of .NET distribution) http://vtd-xml.sf.net
vtd-xml-author
Thanks for the link @Jimmy zhang
GrayWizardx
+1  A: 

Depending on how you construct the XML, one way is to not care about it and let the framework handle it.

XmlDocument doc = new XmlDocument(); 
doc.LoadXml("<?xml version=\"1.0\" encoding=\"utf-8\" ?><helloworld></helloworld>");
string xml = "<how><are><you reply=\"i am fine\">really</you></are></how>";
doc.GetElementsByTagName("helloworld")[0].InnerText = xml;

The output will be something like a HTMLEncoded string:

<?xml version="1.0" encoding="utf-8"?>
<helloworld>&lt;how&gt;&lt;are&gt;&lt;you
  reply="i am fine"&gt;really&lt;/you&gt;&lt;/are&gt;&lt;/how&gt;
</helloworld>
o.k.w
+1  A: 

If you don't need the xml declaration (first line of the document), just insert the root element (with all childs) into the tree of the other xml document as a child of an existing element. Use a different namespace to seperate the inserted elements.

Andreas
This would still result in the parser on the receiving end parsing the embedded xml, which I would like to avoid.
tempy
@tempy the parser has to parse CDATA or base 64 encoded data too, to check it is well formed and to pass it to the application as character data. You will need to benchmark it to see if throwing away structure costs more or less than parsing the extra bytes in base64
Pete Kirkham
@Pete Kirkham That's a good point... I'll have to investigate.
tempy
A: 

I use Comments for this :

<!-- your xml text -->

[EDITED]
If the embedded xml with comments, replace it with a different syntax.

<?xml version="1.0" encoding="iso-8859-1" ?>
<xml>
    <status code="0" msg="" cause="" />
    <data>
        <order type="07" user="none" attrib="..." >
        <xmlembeded >
            <!--
                <?xml version="1.0" encoding="iso-8859-1" ?>
                <xml>
                <status ret="000 "/>
                <data>
                <allxml_here />
                <!** embedeb comments **>
                </data>
                <xml>
            -->
        </xmlembeded >
        </order>
        <context sessionid="12345678" scriptname="/from/..."  attrib="..." />
    </data>
</xml>
lsalamon
What if there is comment in the embeded xml too? Wouldn't it uncomment the rest?
Petr Peller
This is not my case, I know what I'm entering.
lsalamon
This smells a bit too hack-ish for me. Scrubbing the inner xml of comments, which would require treating it as one potentially giant string, could be rather expensive and I think could be avoided by using other methods.
tempy
A: 

Can't you use XSLT for this? Perhaps using xsl:copy or xsl:copy-of? This is what XSLT is for.

Rob