views:

427

answers:

2

I'm returning an UTF-8 XML response and some elements have user provided content, so I must ensure they are properly escaped. Is using htmlspecialchars(..., ENT_COMPAT, 'UTF-8') enough for a proper escape of an XML element text?

+1  A: 

I'm not sure I understand exactly... you want xml inside html or html inside xml? if it's the latter, why not use CDATA?

e.g.

<xmlelement>
  <![CDATA[<span>John Smith</span>]]>
</xmlelement>
Jonathan Fingland
Its an text/xml response, no html. The '<span>' is user provided so even for CDATA I must make sure a malicious user does not enter ']]' to avoid any xss or anything similar.
Remus Rusanu
see http://stackoverflow.com/questions/223652/is-there-a-way-to-escape-a-cdata-end-token-in-xml . You will have to explicitly check for that sequence of chars and strip it.
Jonathan Fingland
A: 

http://www.w3.org/TR/2008/REC-xml-20081126/

2.2 Characters ...

[2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */

So this tells you that there is no way to store characters in the low area < 0x20 without Tab, CR, LF. in Additional the xml parser has to normalize the line feeds. It has to convert CR LF into LF and so on.

So there is no way neither normal node nor CDATA section that allows to transport a binary characters string in XML. IF you want to transport it you have to covert it to base64 or transport is as list of numbers.

Totonga