I have a file containing Unicode characters on a server running linux. If I SSH into the server and use tab-completion to navigate to the file/folder containing unicode characters I have no problem accessing the file/folder. The problem arises when I try accessing the file via PHP (the function I was accessing the file system from was stat
). If I output the path generated by the PHP script to the browser and paste it into the terminal the file also seems to exist (even though looking at the terminal the file paths are exactly the same).
I set PHP to use UTF8 as its default encoding via php_ini as well as set mb_internal_encoding
. I checked the PHP filepath string encoding and it comes out as UTF8, as it should. Poking around a bit more I decided to hexdump
the é character that the terminal's tab-completion and compare it to the hexdump
of the 'regular' é character created by the PHP script or by manually entering in the character via keyboard (option+e+e on os x). Here is the result:
echo -n é | hexdump 0000000 cc65 0081 0000003 echo -n é | hexdump 0000000 a9c3 0000002
The é character that allows a correct file reference in the terminal is the 3-byte one. I'm not sure where to go from here, what encoding should I use in PHP? Should I be converting the path to another encoding via iconv
or mb_convert_encoding
?