tags:

views:

24

answers:

1

A generated utf-8 file displays properly in the terminal: alt text

but not in Firefox or gedit: alt text

It looks like the characters are doubled with weird ones?

The file: http://maestric.com/shared/other/2004_10_14.txt

Any idea on what is wrong with it?

+2  A: 

It seems to be UTF-16. Are you sure your locale and terminal is in UTF-8 ?

Did you try an "od" on the file, or see it in a hex viewer? Never trust your terminal, you must look at the bytes to be sure.

Eg

# od -c -x 2004_10_14.txt  | head
0000000  \0   H  \0   e  \0   u  \0   r  \0   e  \0      \0   d  \0   e
        4800 6500 7500 7200 6500 2000 6400 6500
0000020  \0      \0   d  \0 303 251  \0   b  \0   u  \0   t  \0      \0
        2000 6400 c300 00a9 0062 0075 0074 0020
leonbloy
Thank you! It seems that the file I was trying to clean in the first place wasn't an actual text file but a binary file with some UTF-16 in the middle. This "od" tool will help a lot, thank you.
Jerome Jaglale