views:

37

answers:

1

I have some strings that are valid in my database but when I include them in an attribute of a UTF-8 XML output they give me the following error:

XML Parsing Error: not well-formed

My current code (simplified):

header('Content-Type: text/xml'); 
echo '<?xml version="1.0" encoding="UTF-8" standalone="yes"?>';
echo '<root attribute="' . htmlentities($string_from_hell) . '">'; 

How should I format these strings before including them in XML attributes?

A possible value for $string_from_hell:  (don't know if it will show up properly)

A: 

Try

htmlspecialchars($string_from_hell, ENT_QUOTES, "UTF-8")

htmlentities won't do because it will create HTML entities that are not recognized in XML, only HTML. You should also specify the charset because the default is not UTF-8, it's the ISO-8859-1.

You're also missing the quotes (") around the attribute value.

There are also better ways to create XML files that handle escaping for you. See e.g. XMLWriter.

Artefacto
I think the real answer should be to use the appropriate DOM APIs to construct the XML instead of string concatenation. Also the OP's code misses the quotes around the attribute value as far as I can tell.
Joey
@Johan You're right, I missed the quotes. As to the DOM API, I think it's unnecessarily complicated (and inefficient) for XML building unless you need the complete DOM tree afterwards.
Artefacto
@Johannes Fixed. Wrote the code on the edit box. Sorry.
hgpc
No idea how those APIs look in PHP. But something SAX-like might suffice too (which XMLWriter seems to be). I'm not doing that much in XML so pardon the inaccuracy :-)
Joey