views:

52

answers:

3

I have a file on my computer that I wanted to copy into a MySQL table using PHP. When I open the file the contents look fine, like normal text - but, when I attempt to read the file using PHP or insert into a MySQL table, I get all sorts of funky characters. I thought perhaps it was a utf-8 issue, so I tried setting the header

header('Content-type: text/html; charset=utf-8');

And then echoing the contents retrieved by file_get_contents(), but that didn't make any difference - the output was still funky. I then thought perhaps it was a cp1252 issue so I tried using htmlentities() but that also didn't help:

htmlentities($str, ENT_QUOTES, 'cp1252')

I then uploaded the file to a website (here). On the server when I 'cat' the file, it again looks normal, but in Firefox when I open it I get the funky characters. Here's a screenshot of what it looks like to me: screenshot

Oddly I copied the same exact file to another website's folder on the same server, and yet when I open this file at the new URL (see here) the same file appears different in Firefox - still some funky characters, but less of them. A screenshot of the different appearance: screenshot

Does anybody know what's going on here, and how I can clean the characters up? What character encoding is this file using - and why does the same file look different in Firefox when copied from one website on the server to another?

+1  A: 

Your file is in UTF-16; try using that as the encoding in Firefox. It looks much more correct, though there are still some stray CJK characters, which may be because some of your characters are in the wrong endianness.

Please remember to accept an answer on all of your questions where a good answer was provided; many people on Stack Overflow may not continue to answer your questions with a lack of reputation incentive.

Delan Azabani
Thank you. The odd thing is that it seems the second URL opens automatically in UTF-16 in Firefox, while the first URL opens automatically in UTF-8. That accounts for the difference between the screenshots. I wonder why this would be?
Tristan
A: 

Chrome is picking up the charset as Unicode (UTF-16-LE)

Chris_O
-1 this does not answer the question
sleske
Rightly deserved. I misunderstood the question in my original answer.
Chris_O
+1  A: 

I've had this issue before. There are hidden characters which CANNOT be displayed in certain IDE's.

I was able to resolve this by opening the file in Notepad and and copying the text, then deleting the file. I then created the file from scratch and pasted in just the plain text.

You do not want to copy the text with something like wordpad as that will also copy the hidden characters.

EDIT:

You may also want to try accepting some peoples answers for your previous questions as you will get more replies from people willing to help if they know you will provide reputation.

jostster
Thanks. I'd like to be able to automatically clean these characters, but I'll open up a new question for that. And I'll go back and select answers for my previous questions :)
Tristan