ansaurus

Question

Preg_Replace and UTF8

Answer 1

+2 A:

If I'm not mistaken, preg_match uses the current locale. Try setting the locale to the language which these characters belongs to. You probably need a utf8 based locale too. If you have mixed languages in your page, you may be able to find a generic international locale that works.

See also: http://www.phpwact.org/php/i18n/utf-8

troelskn 2010-01-14 10:00:47

See the update on my question.

Jan Hančič 2010-01-14 10:11:59

`UTF-8` is probably not a valid locale on any system. Try running `locale -a` on a shell, to get the supported locales. You probably want one that looks like `en_GB.utf8`.

troelskn 2010-01-14 10:16:32

Thanks. I have changed it to `sl_SI.UTF-8`, but the results are the same ...

Jan Hančič 2010-01-14 10:20:18

+1 for the link to that phpwact page!

Tobbe 2010-09-24 20:30:21

Answer 2

+1 A:

Not sure what your problem is stemming from, but I just put together this little test case:

<?php

$uc = "SREČA";

mb_internal_encoding('utf-8');
echo $uc."\n";
$lc = mb_strtolower($uc);
echo $lc."\n";

echo preg_replace("/\b(".preg_quote($uc).")\b/ui", "<span class='test'>$1</span>", "test:".$lc." end test");

It's output on my machine:

SREČA
sreča
test:<span class='test'>sreča</span> end test

Seems to be working properly?

gnarf 2010-01-14 10:23:02

Adding `mb_regex_encoding` does not solve the issue (I already have the other two) :\

Jan Hančič 2010-01-14 10:26:42

mb_strtolower converts characters correctly

Jan Hančič 2010-01-14 10:32:18

Answer 3

+3 A:

I feel really stupid right about now but the problem wasn't with Preg_* functions at all. I don't know why but I first checked if the given term is even in the string with StriPos and since that function is not multi-byte safe it returned false if the case of the text was not the same as the search term, so the Preg_Replace wasn't even called.

So the lesson to be learned here is that always use multi-byte versions of functions if you have UTF8 strings.

Jan Hančič 2010-01-17 14:04:55

ansaurus

tags:

views:

answers:

Preg_Replace and UTF8

related questions