views:

187

answers:

1

I am slicing unicode string with diacritics using mb_substr function but it works as I would use simple substr function. It splits unicode characters in half displaying question marked diamond.

E.g.

echo mb_substr('ááááá', 0, 5); //Displays áá�

What might be wrong?

+2  A: 

I have the same problem if I don't specify the encoding as the last parameter to mb_substr : it defaults, at least on my server, to ISO-8859-1.


But, if I set the encoding properly, to UTF-8, it works OK :

echo mb_substr('ááááá', 0, 5, 'UTF-8');

Gets the right display in the browser :

ááááá


See mb_substr (quoting, emphasis mine) :

string mb_substr  ( string $str  , int $start  [, 
    int $length  [, string $encoding  ]] )

The encoding parameter is the character encoding. If it is omitted, the internal character encoding value will be used.

Pascal MARTIN
Internal character encoding may be changed in php.ini (even via ini_set() if you can't change it system-wide). I think that's a saner option (and would shoot idiot who chose single-byte default encoding for dedicated multi-byte functions).
porneL