views:

191

answers:

3

I have a file UTF-8 encoding in windows, and when i use it under windows it shows everithing right, but when i copy the file in Linux, the Unicode characters are giberish. The file is plain textfile. How can i get this file to be readable in linux, or how can i copy it properly??

thanks in advance

+1  A: 

Looks like an Apache/PHP issue

Are you running your strings through PHP's built-in htmlspecialchars method (or similar)? If so, you may need to switch its encoding to UTF8

Instead of htmlspecialchars($mytext), try using htmlspecialchars($mytext, ENT_COMPAT, 'UTF-8')


Note the following (my previous answer) is incorrect, as Michael Burr notes, UTF-8 doesn't need or use the BOM.

If it's just the text, then there's a chance it's missing the Byte Order Mark (BOM), or is encoded with an incorrect BOM.

If it's incorrect, the linux reader may be honouring it but your windows reader is ignoring it. Try re-opening your file in something like Notepad++ and resaving. Notepad++ has a bunch of options in the Format menu about saving UTF-8 files.

kibibu
i did, and i tried all the linux options but still the same. apache doesent recognize it also. it shows me only ????????.
DartesMartes
+1  A: 

Make sure you have transfered the file in binary mode. Also try iconv.

lhf
can i do this with winscp?
DartesMartes
i did this and still the same result :(
DartesMartes
Transferring in binary mode would just makes the line breaks CRLF instead of LF. It wouldn't affect the multibyte UTF-8 characters.
dan04
A: 

The file itself is fine. Something else in the pipe is screwing up the text before it gets sent to the browser. Echo the text at various points in the app to pinpoint which operation is breaking it.

Ignacio Vazquez-Abrams