views:

209

answers:

4

we have this code:

$value = preg_replace("/[^\w]/", '', $value);

where $value is in utf-8. After this transformation first byte of multibyte characters is stripped. How to make \w cover UTF-8 chars completely?

Sorry, i am not very well in PHP

A: 

Use [^\w]+ instead of [^\w]

You can also use \W in place of [^\w]

codaddict
+2  A: 

try this function instead...http://php.net/manual/en/function.mb-ereg-replace.php

W_P
I would rather advise *not* to use `mb_ereg_replace`. It is built on the deprecated `ereg_replace`. See http://php.net/ereg_replace
soulmerge
+3  A: 

You could try with the /u modifier:

This modifier turns on additional functionality of PCRE that is incompatible with Perl. Pattern strings are treated as UTF-8. This modifier is available from PHP 4.1.0 or greater on Unix and from PHP 4.2.3 on win32. UTF-8 validity of the pattern is checked since PHP 4.3.5.

If that won't do, try

instead.

Gordon
+1  A: 

There is this nasty u modifier to pcre patterns in PHP. It states that the regex is encoded in UTF8, but I found that it treats the input as UTF8, too.

soulmerge