tags:

views:

89

answers:

2

Hello,

I am using the following code:

CARRIS_REGEX=r'<th>(\d+)</th><th>([\s\w\.\-]+)</th><th>(\d+:\d+)</th><th>(\d+m)</th>'
pattern = re.compile(CARRIS_REGEX, re.UNICODE)
matches = pattern.finditer(mailbody)
findall = pattern.findall(mailbody)

But finditer and findall are finding different things. Findall indeed finds all the matches in the given string. But finditer only finds the first one, returning an iterator with only one element.

How can I make finditer and findall behave the same way?

Thanks

+1  A: 

You can't make them behave the same way, because they're different. If you really want to create a list of results from finditer, then you could use a list comprehension:

>>> [match for match in pattern.finditer(mailbody)]
[...]

In general, use a for loop to access the matches returned by re.finditer:

>>> for match in pattern.finditer(mailbody):
...     ...
Tim McNamara
Yes, I know that. Problem is, they don't find the same matches. findall finds all the matches in the string. finditer only finds the first one and yes I used a for in loop to traverse all the elements in the iterator.
simao
`[match for match in pattern.finditer(mailbody)]` is just a slower and less readable way of saying `list(pattern.finditer(mailbody))`
aaronasterling
Thanks @ArronMcSmooth, good point.
Tim McNamara
A: 

I can't reproduce this here. Have tried it with both Python 2.7 and 3.1.

One difference between finditer and findall is that the former returns regex match objects whereas the other returns a tuple of the matched capturing groups (or the entire match if there are no capturing groups).

So

import re
CARRIS_REGEX=r'<th>(\d+)</th><th>([\s\w\.\-]+)</th><th>(\d+:\d+)</th><th>(\d+m)</th>'
pattern = re.compile(CARRIS_REGEX, re.UNICODE)
mailbody = open("test.txt").read()
for match in pattern.finditer(mailbody):
    print(match)
print()
for match in pattern.findall(mailbody):
    print(match)

prints

<_sre.SRE_Match object at 0x00A63758>
<_sre.SRE_Match object at 0x00A63F98>
<_sre.SRE_Match object at 0x00A63758>
<_sre.SRE_Match object at 0x00A63F98>
<_sre.SRE_Match object at 0x00A63758>
<_sre.SRE_Match object at 0x00A63F98>
<_sre.SRE_Match object at 0x00A63758>
<_sre.SRE_Match object at 0x00A63F98>

('790', 'PR. REAL', '21:06', '04m')
('758', 'PORTAS BENFICA', '21:10', '09m')
('790', 'PR. REAL', '21:14', '13m')
('758', 'PORTAS BENFICA', '21:21', '19m')
('790', 'PR. REAL', '21:29', '28m')
('758', 'PORTAS BENFICA', '21:38', '36m')
('758', 'SETE RIOS', '21:49', '47m')
('758', 'SETE RIOS', '22:09', '68m')

If you want the same output from finditer as you're getting from findall, you need

for match in pattern.finditer(mailbody):
    print(tuple(group for group in match.groups()))
Tim Pietzcker
I don't know why it wasn't working. I uninstalled python 2.5 and upgraded to 2.6 and it's working now :|
simao