How does this regular expression work?

views:

109

answers:

+6 Q:

How does this regular expression work?

/^1?$|^(11+?)\1+$/ checks whether a number(its value in unary) is prime or not.

Using this, perl -l -e '(1 x $_) !~ /^1?$|^(11+?)\1+$/ && print while ++$_;' returns a list of prime numbers.

I do not have enough experience with Perl, but what I understand is that the regular expression will be true for a number that is not prime. So, if we print all numbers that do not produce a true with this expression, we have a list of prime numbers. Thats what the perl query is trying to do.

About the regex part,

^1?$ part is for counting 1 as not prime

^(11+?)\1+$ is for matching not prime numbers starting from 4.

What I do not understand is why is the ? in the regex needed at all. According to me /^1$|^(11+)\1+$/ should be just fine and actually

perl -l -e '(1 x $_) !~ /^1$|^(11+)\1+$/ && print while ++$_;' gives me the same set of prime numbers.

Is there any flaw in my understanding of the regular expression? Why are the ?s needed?

Isn't ? supposed to match zero or one occurrence of the expression preceding it?

+6 A:

The first ? will make "" (the empty string, unary zero) not be a prime number. Zero is defined as not prime.

The second is different; it stops the regular expression from greedy-matching. It should improve the performance of the match greatly, since the first part of that section ((11+)) won't consume almost the entire string before having to backtrack. If you omit the question mark, you're effectively testing whether odd n is divisible by n-1 and so one down; if you include it, you're testing divisibility by two first and so on upwards. Obviously, numbers tend to be divisible by smaller factors more often, so your match will be faster.

Borealid 2010-07-25 15:34:22

@Borealid: Okay, that explains why `?` is needed in `^1?$`. But why is it needed in the second half of the expression?

Lazer 2010-07-25 15:39:40

@Lazer: No, the second, much bigger paragraph is about the second ?. What you've missed in the perlre page is that ? after + or * means non-greedy matching (stop scanning as soon as you have a match).

reinierpost 2010-07-25 16:11:16

@reinierpost, that second paragraph was added after that comment was written.

cjm 2010-07-25 16:33:08

@cjm: yes, but before you wrote your answer :-P. Thanks for the comment anyhow.

Borealid 2010-07-25 23:46:03

@Borealid: well. you updated your answer between the time I started writing mine and the time I finished it. Honestly, I think my answer is slightly better, but then I'm biased. :-)

cjm 2010-07-26 00:46:17

+3 A:

The first ? is for matching the empty string (i.e. 0) as non-prime. If you don't care whether the regexp matches 0, then it's not necessary.

The second ? is only for efficiency. + is normally "greedy", which means it matches as many characters as are available, and then backtracks if the rest of the regexp fails to match. The +? makes it non-greedy, so it matches only 1 character, and then tries matching more if the rest of the regexp fails to match. (See the Quantifiers section of perlre for more about greedy vs. non-greedy matching.)

In this particular regexp, the (11+?) means it tests divisibility by 2 ('11'), then 3 ('111'), then 4, etc. If you used (11+), it would test divisibility by N (the number itself), then N-1, then N-2, etc. Since a divisor must be no greater than N/2, without the ? it would waste time testing a lot of "potential" divisors that can't possibly work. It would still match non-prime numbers, just more slowly. (Also, $1 would be the largest divisor instead of the smallest one.)

cjm 2010-07-25 15:51:21

@cjm: Is this a standard way of making expressions non-greedy? Where all does it work other than `+?` and `*?`. I thought `?` meant match zero or one time.

Lazer 2010-07-25 15:55:50

@Lazer: the question mark following a quantifier (such as `+` or `*`) is completely different from the one following a token.

Borealid 2010-07-25 15:57:32

A `?` that follows another quantifier makes that quantifier non-greedy. See http://perldoc.perl.org/perlre.html#Quantifiers

cjm 2010-07-25 15:58:50

@cjm @Borealid: "A ? that follows another quantifier makes that quantifier non-greedy" Is it true for all POSIX regexes?

Lazer 2010-07-25 16:08:20

@Lazer: No. Only those that follow the extended syntax first introduced by Perl 5 (and since copied by numerous other programs).

cjm 2010-07-25 16:12:10

@Lazer: {n,}? or {n,m}? too (prefers lower repeat counts instead of higher)

ysth 2010-07-25 17:02:38

ansaurus

tags:

views:

answers:

How does this regular expression work?

related questions