views:

112

answers:

4

I am looking through some text file for a certain string with the method.

re.finditer(pattern,text) I would like to know when this returns nothing. meaning that it could find nothing in the passed text.

I know that callable iterators, have next() and __iter__

I would like to know if I could get the size or find out if it returns no string matching my pattern.

Pardon the poorly worded question from before for those of you who read it. I just hit the afternoon wall.

+1  A: 

Nope sorry iterators are not meant to know length they just know what's next which makes them very efficient at going through Collections. Although they are faster they do no allow for indexing which including knowing the length of a collection.

Jesus Ramos
+1. Iterators wouldn't be 1/5 as useful as they are if they were nailed to some length in advance. Use (any collection) for that.
delnan
there is no way of knowing length unless you iterate through the whole sequence.
Jesus Ramos
iterators are just for efficiency and should generally be used if you need to go through an entire collection regardless of order, it's always faster to iterate through an array or collection with an iterator than increment an index and check each index.
Jesus Ramos
A: 

You can get the number of elements in an iterator by doing:

len( [m for m in re.finditer(pattern, text) ] )

Iterators are iterators because they have not generated the sequence yet. This above code is basically extracting each item from the iterator until it wants to stop into a list, then taking the length of that array. Something that would be more memory efficient would be:

count = 0
for item in re.finditer(pattern, text):
    count += 1

A tricky approach to the for-loop is to use reduce to effectively count the items in the iterator one by one. This is effectively the same thing as the for loop:

reduce( (lambda x, y : x + 1), myiterator, 0)

This basically ignores the y passed into reduce and just adds one. It initializes the running sum to 0.

orangeoctopus
A: 

A quick solution would be to turn your iterator into a list and check the length of that list, but doing so can be bad for memory if there are too many results.

matches = list(re.finditer(pattern,text))
if matches:
  do_something()
print("Found",len(matches),"matches")
Kevin Stock
+2  A: 
Hamish Grubijan
This doesn't work with most iterators or generators. `getIterLength` will consume your `iterator`; assigning `iter(temp)` to `iterator` inside the function only creates a new local variable called `iterator` there which is discarded upon return from the function. Try substituting the line `f = xrange(20)` in your example with `f = iter([1,2,3,4,5])` to see what I mean.
Tim Pietzcker
Or compare `id(f)` with `id(iterator)` at the start of the function (they are the same), `id(iterator)` at the end of the function (it's different) and `id(f)` upon return from the function (it's the same as before). You're not putting the cloned cake into the same box, you're putting it into a new one and throwing it away.
Tim Pietzcker
Interesting, though, that it does work with `xrange()`. It definitely doesn't work with `re.finditer()`.
Tim Pietzcker
I do not think my answer was good enough to be an accepted one. I clearly indicated that this is an expensive hack. Apparently it does not always work, although I am not convinced that it is broken either. I will re-work the solution to return the iterator.
Hamish Grubijan
@Tim Pietzcker - is the new version broken with `re.finditer()` as well?
Hamish Grubijan
It looks a lot better now. I'm not sure if it will work in all cases because you're changing the iterator from whatever type it is (callable_iterator, generator, file object etc.) to a list_iterator. But I haven't found a case yet where it breaks, so it's definitely worth a try - if there really is a valid use case for using an iterator *and* wanting to know its length beforehand.
Tim Pietzcker