tags:

views:

371

answers:

2

So I have a string that is in another language, most of it looks great, but parts of it is encoded incorrectly. How do I convert the literal string \u0026#39;n into its unicode(?) equivalent in PHP?

+2  A: 

Picking it apart, that looks to have been through at least two different encoding processes. To start with \u0026 - that's unicode code point hex 26, or 38 in decimal. The first 128 unicode codepoints are the same as ASCII, so this is ASCII 38, an ampersand.

So now we have 'n, which looks like an HTML or XML entitization for character 39, which is the single quote character, '.

Giving us 'n. Which I can't see how to decode further - does the context provide further clues?

AakashM
Character entities are not in hex unless they use the letter `x`.`'` is the single quote character, `'`, not the digit 9. The digit 9 is `9` or `9`.
Pourquoi Litytestdata
@Pourqoui: thanks, edited.
AakashM
A: 

The following PHP function will translate \u0026#39;n into 'n. This is used to communicate with the Google Translate API.

function unescapeUTF8EscapeSeq($str) {
    return preg_replace_callback("/\\\u([0-9a-f]{4})/i",
        create_function('$matches',
            'return html_entity_decode(\'&#x\'.$matches[1].\';\', ENT_QUOTES, \'UTF-8\');'
        ), $str);
}
Rook