tags:

views:

433

answers:

3

XML, Why are null char disallowed even in CDATA sections?

It seems to terminate the file right there.

Any solution? Base64?

+2  A: 

You might find your answer in this previous question:

http://stackoverflow.com/questions/404107/why-are-control-characters-illegal-in-xml

null
+2  A: 

Because it's no valid XML character, ie it should produce a parse error. This is likely because of historical reasons (null terminated strings) and because of XML's plain-text nature: Anything on which a Unicode-capable editor might choke is discouraged...

Christoph
+1  A: 

It shouldn't 'terminate the file', but it should generate a well-formedness error. It's disallowed because so much of the world is still using null-terminated string processing, so allowing a \0 is likely to cause trouble at some unspecified point down the processing chain.

This can possibly even be a security vulnerability; there have been many exploits in the past that have relied on the interfacing of systems which allow \0 and those which take it as a terminator. The safest thing to do, therefore, is simply to disallow it.

Other control characters can be escaped as &#...; character references elsewhere in XML 1.1, but not in CDATA sections. In XML 1.0 there is no way to get control characters in at all. It is, after all, supposed to be a text-based, human-readable format.

Base64?

Yes. But if you are processing mostly big chunks of binary, encapsulating it in XML is probably not a reasonable choice.

bobince
On the other hand, if you're required to store big chunks of binary data in an XML document, base 64 encoding is the way to do it.
Robert Rossney