views:

192

answers:

2

Hi guys,

I need to save files with non-latin filenames on a filesytem, using PHP.

I want to make this work cross-platform. How do I know what encoding I can use to write the file? I understand many modern filesystems are UTF-8 based (is this correct?), but I doubt Windows XP is (for instance).

So, is there a robust detection mechanism?

Evert

+3  A: 

Not an answer to your question, but if you don't need to do extensive operations on filesystem level (like searching, sorting...), there is a nice cross-platform workaround for the issue outlined in this SO question: URLEncode()ing file names.

Hörensägen.txt 

gets turned into

H%c3%b6rens%c3%a4gen.txt

which should be safe to use in any filesystem and is able to map any UTF-8 character.

I find this much preferable to trying to "natively" deal with the host OS's capabilities, which is guaranteed to be complicated and error-prone (in addition to operating system differences, I'm sure the various filesystem formats - FAT16, FAT32, NTFS, extFS versions 1/2/3.... bring their own set of rules to be aware of.)

Pekka
Not a bad suggestion. I suppose I could provide the option. The question you linked to also mentions Windows uses ISO-8859-1.
Evert
@Evert not exactly, Windows's string handling has been UTF-16 based for a long time as far as I know, the answer claims *PHP's wrapper* to Windows' filesystem functions uses ISO-8859-1. I don't know for a fact whether that is true, but it is possible.
Pekka
A: 

Not an answer, but... WinXP is UTF-8 based (according to Jeffrey Richter's "CLR via C#" book) and all ASCII WinAPI functions are just wrappers for similar UTF-8 ones.

kpower
'WinXP is UTF-8 based' bollocks. all of the WinNT family is built on utf-16 (well - almost true, earlier nt doesn't handle surrogates)
steelbytes
Oh, really sorry. Typed it without thinking about "..-8"
kpower