How can UTF-8 strings (i.e. 8-bit string) be converted to/from XML-compatible 7-bit strings (i.e. printable ASCII with numeric entities)?
i.e. an encode()
function such that:
encode("“£”") -> "“£”"
decode()
would also be useful:
decode("“£”") -> "“£”"
PHP's htmlenties()
/html_entity_decode()
pair does not do the right thing:
htmlentities(html_entity_decode("“£”")) ->
"“£”"
Laboriously specifying types helps a little, but still returns XML-incompatible named entities, not numeric ones:
htmlentities(html_entity_decode("“£”", ENT_QUOTES, "UTF-8"), ENT_QUOTES, "UTF-8") ->
"“£”"