tags:

views:

442

answers:

3

I have some data in a database that I need represented in an XML file, but the data is different for each request, so the XML file is generated through PHP. For example purposes, let's say this is the text in the db:

Hello & Goodbye

I've tried using the following to get the above (set to the $example variable) to show up as Hello & Goodbye in the generated XML:

$example = mb_convert_encoding($example, "utf-8", "HTML-ENTITIES" );

$example = htmlspecialchars_decode($example);

$example = html_entity_decode($example);

$example = str_replace("&", "&", $example);

These lines will replace other entities, like ", to their proper characters, but not &. Any idea how to get this working correctly?

+6  A: 

& is an invalid character in XML, which is never allowed all by itself. Because of that it's likely that whatever XML library you're using is translating it to & on the fly. That's the way it should be - otherwise the XML won't be valid.

Emil H
A: 

Your code works to decode the entity, so that isn't the problem.

I'm guessing your XML output library is what's re-escaping the entity. The thing to understand is that this correct behaviour. While quote marks can appear in unescaped form in XML documents (except obviously inside attribute values), ampersands can't be used on there own because in almost all contexts they signify the start of an entity.

Any XML parser reading the output from your code will understand Hello & Goodbye in the XML to be a representation of the string value "Hello & Goodbye".

grahamparks
A: 
$example = mb_convert_encoding($example, "utf-8", "HTML-ENTITIES" );

followed by

$example = htmlspecialchars_decode(utf8_encode($example));

worked for me, it outputs a utf-8 "compliant" string.