tags:

views:

156

answers:

2

While using regex to help solve a problem in the Python Challenge, I came across some behaviour that confused me.

from here:

(...) Matches whatever regular expression is inside the parentheses.

and

'+' Causes the resulting RE to match 1 or more repetitions of the preceding RE.

So this makes sense:

>>>import re
>>>re.findall(r"(\d+)", "1111112")
['1111112']

But this doesn't:

>>> re.findall(r"(\d)+", "1111112")
['2']

I realise that findall returns only groups when groups are present in the regex, but why is only the '2' returned? What happends to all the 1's in the match?

+7  A: 

Because you only have one capturing group, but it's "run" repeatedly, the new matches are repeatedly entered into the "storage space" for that group. In other words, the 1s were lost when they were "overwritten" by subsequent 1s and eventually the 2.

Ben Blank
There are also non-grouping parentheses: try r"(?:\d)+".
John Fouhy
+1  A: 

You are repeating the group itself by appending '+' after ')', I do not know the implementation details but it matches 7 times, and returns only the last match.

In the first one, you are matching 7 digits, and making it a group.

hayalci