tags:

views:

202

answers:

3

Hey,

I need a function that matches full words in hebrew in php.

Please help.

+2  A: 

Try this regular expression describing Unicode character properties:

/\p{Hebrew}+/u
Gumbo
but echo preg_match("/\p{Hebrew}+/", "שלון"); returns 0...
Haim Bender
@Haim Bender: You need to set the *u* modifier.
Gumbo
Where is the `\p{Hebrew}` shortcut described? I've never seen that before.
troelskn
+3  A: 

Assuming your source data is UTF-8 encoded

$input = "ט״סת תעסתינג O״ת סOמע העברעו תעחת";

preg_match_all( "/[\\x{0590}-\\x{05FF}]+/u", $input, $matches );

echo '<pre>';
print_r( $matches );
echo '</pre>';

Yields

Array
(
    [0] => Array
        (
            [0] => ט״סת
            [1] => תעסתינג
            [2] => ״ת
            [3] => ס
            [4] => מע
            [5] => העברעו
            [6] => תעחת
        )

)

I based the range of 0590 through 05FF on this Unicode chart (edit: found more good hebrew/unicode info here). I used this to generate my sample input. Since I don't know hebrew I can't actually verify that the matched output is valid.

You may need to tweak it but hopefully this gets you headed in the right direction.

Peter Bailey
why does it match space?
Haim Bender
cool!, and thanks btw :)
Haim Bender
i just need to check if a string is one single word in Hebrew, do I need to specify start and end of string in the regex? How can I implement this?
Haim Bender
Yeah, I suppose that would work - again I'm not going to pretend to understand the grammar of Hebrew.
Peter Bailey
A: 

Thanks for all your answers,

The one that works for me is preg_match("/^\p{Hebrew}+$/u", "שלום");

Haim Bender