views:

175

answers:

2

In the spirit of polygenelubricants' efforts to do silly things with regular expressions, I currently try to get the .NET regex engine to multiplicate for me.

This has, of course, no practical value and is meant as a purely theoretical exercise.

So far, I've arrived at this monster, that should check if the number of 1s multiplied by the number of 2s equals the number of 3s in the string.

Regex regex = new Regex(
@"
^
(1(?<a>))*  # increment a for each 1
(2(?<b>))*  # increment b for each 2
    (?(a)   # if a > 0
        (                   
            (?<-a>)             # decrement a
            (3(?<c-b>))*        # match 3's, decrementing b and incrementing c until
                                # there are no 3's left or b is zero
            (?(b)(?!))          # if b != 0, fail
            (?<b-c>)*           # b = c, c = 0
        )
    )*      # repeat
(?(a)(?!))  # if a != 0, fail
(?(c)(?!))  # if c != 0, fail
$
", RegexOptions.IgnorePatternWhitespace);

Unfortunately, its not working, and I am at a loss why. I commented it to show you what I think the engine should be doing, but I may be off here. Examples of output:

regex.IsMatch("123") // true, correct
regex.IsMatch("22") // true, correct
regex.IsMatch("12233") // false, incorrect
regex.IsMatch("11233"); // true, correct

Any thought are welcome!

+1  A: 

I'm pretty sure the problem is in this line:

(?<b-c>)*

From what I can tell, with no text to match in there, the Regex refuses to match it more than one time. I slimmed down the Regex to the following:

(1(?<a>))*
(?(a)(?<-a>))*
(?(a)(?!))

Which passes on 1 but fails on 111. Also tried (?<-a>)*. No difference. However, changing it to

(1(?<a>))*
(?(a)((?<-a>)(2(?<b>))(?<-b>)))*
(?(a)(?!))

passes on both 12 and 111222. So going from a match of "" to a match with something causes the Regex to work as expected.

Getting back to your original Regex, my guess is that (?<b-c>)* is only matching 0-1 times, which explains why having one 2 in your string works, but having more than one fails.

Using a string of 11 also fails, which follows the same logic, as that makes the entire match "", which most likely means it only matches once, causing (?(a)(?!)) to fail.

Joel Rondeau
Nice analysis, thanks! I'll see if I can fix that... =)
Jens
A: 

With Joel's input I was able to get it to work, modifying the algorithm slightly to avoid those (?<b-c>)* lines.

Behold:

Regex regex = new Regex(
@"
^
(1(?<a>))*  # increment a for each 1
(2(?<b>))*  # increment b for each 2
    (?(a)   # if a > 0
         (
            (?<-a>)             # decrement a
            (?(b)               # if b > 0
                (                                       
                    (3(?<c-b>))*        # match 3's, decrementing b and incrementing c until
                                        # there are no 3's left or b is zero
                    (?(b)(?!))          # if b != 0, fail
                )
                |                       # else ( b = 0 )
                (
                    (3(?<b-c>))*        # match 3's, decrementing c and incrementing b until
                                        # there are no 3's left or c is zero
                    (?(c)(?!))          # if c != 0, fail
                )
            )
        )
    )*      # repeat
(?(a)(?!))  # if a != 0, fail
$
", RegexOptions.IgnorePatternWhitespace);

I'd like to give an ideone link, but the result I get there differs from mine. Maybe because I am using .NET 4.0 and they don't?

Jens
This still fails on the `11` case, but I haven't found another fail case for it.
Joel Rondeau