ansaurus

Question

regex breaking Chinese string

Answer 1

A:

Since the input string is multi-byte, I guess you'll have to use mb_split in place of preg_split.

codaddict 2010-07-30 04:24:50

if i use mb_split i only get `string(25) "我觉得你很麻烦"` as output (double space?)

Moak 2010-07-30 04:29:38

@Moak With mb_split you can't add delimiters. You specify the global modifiers in another parameter.

Artefacto 2010-07-30 04:32:20

Answer 2

+4 A:

If your string is in UTF-8, you must use the u modifier:

$sample = "你不喜欢 香蕉 吗";
$parts = preg_split("/[\\s,]+/u", $sample);
var_dump($parts);

If it's in another encoding, see unicornaddict's answer.

Artefacto 2010-07-30 04:27:29

`非常好 cheers =)`

Moak 2010-07-30 04:32:58

regex breaking Chinese string