tags:

views:

73

answers:

3

I found that findall(r'(ab)+', "ababababab") can only match the ["ab"]

>>> re.findall(r'(ab)+', "ababababab")
['ab']

i just know that using r'(?:ab)+' can match all the characters

>>> re.findall(r'(?:ab)+', "ababababab")
['ababababab']

Why does this happen?

+1  A: 

If the pattern contains a group, findall returns the group rather than the entire match. Here (ab)+ matches the entire string, but only the group (ab) is returned.

Paul Hankin
+5  A: 

I think the question you are asking here is why does it return this:

>>> re.findall(r'(ab)+', "ababababab")
['ab']

The answer is that if you have one or more groups in the pattern then findall will return a list with all the matched groups. However your regex has one group that is matched multiple times within the regex, so it takes the last value of the match.

I think what you want is either this:

>>> re.findall(r'(ab)', "ababababab")
['ab', 'ab', 'ab', 'ab', 'ab']

or the version you posted:

>>> re.findall(r'(?:ab)+', "ababababab")
['ababababab']
Dave Kirby
And to clarify further, `r'(?:ab)+'` doesn't contain a group, and when the pattern has no groups, `re.findall` returns a list containing the entire matched text.
jchl
A: 

Sorry,i may not speak my question clearly

(?:ab) takes 'ab' as a whole ,let's make c=ab,so c+=ababab....

so this is clearly

re.findall(r'(?:ab)+', "ababababab")
['ababababab']

my question is that why does this happen:

match=re.search(r'(ab)+', "ababababab")
match.group()
'ababababab'

shindow