views:

47

answers:

3

I have a UTF-8 encoded xml file, which was exported from a Wordpress MySQL database.

While the file is saved as UTF-8, and the encoding is UTF-8, I get gibberish instead of the Hebrew text that is supposed to be in there, which looks like this:

™×•×˜×•×ª

How can I find the original encoding or charset and convert the text into proper Hebrew?

PHP's mb_detect_encoding($str); returns UTF-8

Tried all sorts of php encoding functions, with different settings and input/output charsets, but they all just print different looking gibberish blocks, like:

ÃâÃËÃâ¢Ãâ¢ÃËÃ

and

�� ×שמ×

...Any Ideas how to go about this?

A: 

This is very similar to this question.

From what I could see, this is a mangled Unicode string, where each unicode character got encoded as two unicode characters.

The code I came up with simply discarded the empty high-order byte and reconstructed the original byte array from that. The code is only an example and is very simplistic in approach, but should help you get there.

Oded
A: 

take a look at your php file, maybe it isn't utf-8 and thats the reason why your xml query returns this unwanted string.

Nort
+1  A: 

In case you have access to the database, you can fix it easily by exporting it as latin1 and importing as UTF8. As it has been suggested here.

Tomer Cohen