ansaurus

Question

Answer 1

+1 A:

Mac stores unicode chars as "decomposed", that is, "u" + ¨ (diaresis) instead of "ü". Normalizer can take care of that. If you don't have Normalizer, try iconv('UTF8-MAC', 'UTF8', $str)

stereofrog 2010-03-26 12:20:31

I did not know about UTF8-MAC. I was looking for documentation around finding out which encodings are available, but I couldn't find it.Any idea where I would have been able to find UTF8-MAC ?

Evert 2010-03-26 13:23:30

on my system (osx 10.6) "iconv --list" shows 'UTF8-MAC' among others, but the above code doesn't work. Strange.

stereofrog 2010-03-26 13:45:41

Answer 2

+1 A:

I hate answering my own questions, but here goes.

I ended up not bothering. Did extensive research on how various operating systems encode, and handle encodings. Turns out that in most cases other os's handle paths using other normalization forms alright. Windows worked a bit shitty though, but it works.

Whenever I receive a path that's actually non-utf8 altogether, I try to detect the encoding and convert it to UTF-8.

Evert 2010-08-22 11:35:11

ansaurus

tags:

views:

answers:

Normalizing (webdav) unicode paths

related questions