So I have a string that is in another language, most of it looks great, but parts of it is encoded incorrectly. How do I convert the literal string \u0026#39;n
into its unicode(?) equivalent in PHP?
views:
371answers:
2
+2
A:
Picking it apart, that looks to have been through at least two different encoding processes. To start with \u0026
- that's unicode code point hex 26
, or 38 in decimal. The first 128 unicode codepoints are the same as ASCII, so this is ASCII 38, an ampersand.
So now we have 'n
, which looks like an HTML or XML entitization for character 39, which is the single quote character, '
.
Giving us 'n
. Which I can't see how to decode further - does the context provide further clues?
AakashM
2010-03-14 20:22:58
Character entities are not in hex unless they use the letter `x`.`'` is the single quote character, `'`, not the digit 9. The digit 9 is `9` or `9`.
Pourquoi Litytestdata
2010-03-14 20:39:41
@Pourqoui: thanks, edited.
AakashM
2010-03-14 21:14:18
A:
The following PHP function will translate \u0026#39;n
into 'n
. This is used to communicate with the Google Translate API.
function unescapeUTF8EscapeSeq($str) {
return preg_replace_callback("/\\\u([0-9a-f]{4})/i",
create_function('$matches',
'return html_entity_decode(\'&#x\'.$matches[1].\';\', ENT_QUOTES, \'UTF-8\');'
), $str);
}
Rook
2010-03-28 22:49:33