tags:

views:

165

answers:

1

I'd like to use regular expressions to extract information out of some chat logs. The format of the strings being parsed are 03:22:32 PM <b>blcArmadillo</b>. I used the python type() command to find that the variable messages is a callable-iterator. My question is how do I most efficiently navigate through a callable-iterator? Are they like arrays where you can just use an index? The only way I could find to "extract" the data was to loop through the returned values and add them to a list as shown in the code snipet below.

times = []
messages = re.compile(r'(?P<time>..:..:.. ..).*?<b>(?P<usrname>.*?):</b>').finditer(search)

for result in messages:
    times.append(result.group('time'))

Is there a better more effiecnt way of doing this? Thanks for the help.

+3  A: 

An iterator is just an object with a next method. Every time you call it, it returns the next item in a collection. If you need to access arbitrary indexes, you will pretty much have to convert it into a list. Instead of this:

for result in messages:
    times.append(result.group('time'))

You can say this though:

times = [result.group('time') for result in messages]

This does pretty much the same thing. However, I should warn you that doing this for large result sets will be pretty slow and will eat up a bunch of memory. Thus, you shouldn't do this if you don't need random access. If data an untrusted user enters will determine how many results will appear, you might also want to limit the number of things they can enter.

EDIT: I just noticed that my previous answer didn't quite do the same as the snippet you posted, so I've updated it.

Jason Baker
> An iterator is just an object with a next method.Not *quite* true; an iterator also has an '__iter__' method, that returns the same iterator (so that there's a single interface for the built-in 'iter(foo)' function to get the iterable version of an object, even one that is already an iterator).
bignose