views:

53

answers:

2

I'm trying to split a string with text into words by using the php-function preg_split.

$words = preg_split('/\W/u',$text);

It works fine except for swedish chars lite åäö. Doing utf8_encode or decode doesn't help either. My guess is that preg_split only works with single byte chars and that the swedish chars are multibyte. Is there another way to do it?

+3  A: 

Why are you paying any attention to specific characters?

$text = "Jag har hört så mycket om dig.";
$words = explode(" ", $text);
/*
Array
(
    [0] => Jag
    [1] => har
    [2] => hört
    [3] => så
    [4] => mycket
    [5] => om
    [6] => dig.
)
*/
Jonathan Sampson
Ah, I think the reason was that i want to split on anything that is not a-ö. But maybe I could loop through the array and do that afterwards?
Martin
Detta är det rätta svaret.
Ether
+1  A: 

mb_split to the rescue (had problems myself with these some time ago, just now found the answer :)

mb_regex_encoding('UTF-8');
mb_split('\W', $text);

HTH

robertbasic