views:

89

answers:

3

Hi, i want to ask, what is the meaning or difference between these two line?

  1. if( preg_match_all('/\#([א-תÀ-ÿ一-龥а-яa-z0-9\-_]{1,50})/iu', $message, $matches, PREG_PATTERN_ORDER) ) {

  2. if( preg_match_all('/\#([а-яa-z0-9\-_\x{4e00}-\x{9fa5}]{1,50})/iu', $message, $matches, PREG_PATTERN_ORDER) ) {

and what does the number 3 mean in this line? (Arrow pointing)

if( preg_match_all('/\@([a-zA-Z0-9\-_\x{4e00}-\x{9fa5}]{->3,30})/u', $message, $matches, PREG_PATTERN_ORDER) ) {

Thanks!

+2  A: 

I'll answer the 2nd part of your question:

The {3,30} in the regex means quantifier for a min of 3 and a max of 30 repetitions.

  • a* means zero or more a
  • a+ means one or more a
  • a? means zero or one a
  • a{1} means exactly one a same as just a
  • a{1,} means one or more a same as a+
  • a{1,3} means min of one and max of 3 a's

you can have any complex regex in place of a. Example: [a-zA-Z]{3,30} would mean at least 3 and at max 30 of any of the alphabets.

codaddict
A: 

Your first regex includes Hebrew and accented Latin characters (and possibly others) that the 2nd regex does not include.

Gabe
A: 

The second expression uses Unicode syntax to match Unicode characters.

\x{FFFF} where FFFF are 1 to 4 hexadecimal digits
Perl syntax to match a specific Unicode code point. Can be used inside character classes.

Example:
\x{E0} matches à encoded as U+00E0 only.
\x{A9} matches ©

Thus it tries to match every Unicode character from U+4e00 to U+9fa5 (from to ) whereas the last one is not a valid Unicode character.


The first expressions also tries to match these characters (一-龥) but they are not expressed in the Unicode syntax (whether or not this opposes a problem I don't know). In addition (as already mentioned) the first expression matches more characters, namely א-ת and À-ÿ.


The second question was already very well answered by unicornaddict.

Felix Kling