views:

267

answers:

8

Is there an efficient way to know how many elements are in an iterator in Python, in general, without iterating through each and counting?

thanks.

+6  A: 

No, any method will require you to resolve every result. You can do iter_length = len(list(iterable)), but running that on an infinite iterator will of course never return. It also will consume the iterator and it will need to be reset if you want to use the contents.

Telling us what real problem you're trying to solve might help us find you a better way to accomplish your actual goal.

Edit: Using list() will read the whole iterable into memory at once, which may be undesirable. Another way is to do sum(1 for _ in iterable) as another person posted. That will avoid keeping it in memory.

Daenyth
the problem is I am reading a file with "pysam" that has millions of entries. Pysam returns an iterator. To compute a certain quantity, I need to know how many reads are in the file, but I don't need to read each one... that's the issue.
I'm not pysam user, but It's probably reading file "lazy". It make sense because you don't want to have big file in memory. So if you must know no. of records before iteration, only way is create two iterators, and use first one to count elements and second one to read file. BTW. Don't use `len(list(iterable))` it will load all data to memory.You can use: `reduce(lambda x, _: x+1, iterable, 0)`. Edit: Zonda333 code with sum is also good.
Tomasz Wysocki
@user248237: why do you say you need to know how many entries are available to compute a certain quantity ? You could just read a fixed amount of them and manage the case when there is less than that fixed amount (really simple to do using iterslice). Is there another reason you have to read all entries ?
kriss
@Tomasz Note that reduce is deprecated, and will be gone in Python 3 and up.
Wilduck
@Wilduck: It's not gone, just moved to `functools.reduce`
Daenyth
@Wilduck, @Daenyth: thx. for info.
Tomasz Wysocki
A: 

This code should work:

>>> iter = (i for i in range(50))
>>> sum(1 for _ in iter)
50
Zonda333
Looks to me like this does exactly what OP doesn't want to do: iterate through the iterator and count.
Adam Crossland
+2  A: 

No. It's not possible.

Example:

import random

def gen(n):
    for i in xrange(n):
        if random.randint(0, 1) == 0:
            yield i

iterator = gen(10)

Length of iterator is unknown until you iterate through it.

Tomasz Wysocki
Alternately, `def gen(): yield random.randint(0, 1)` is infinite, so you will never be able to find a length by iterating through it.
tgray
+8  A: 

An iterator is just an object which has a pointer to the next object to be read by some kind of buffer or stream, it's like a LinkedList where you don't know how many things you have until you iterate through them. Iterators are meant to be efficient because all they do is tell you what is next by references instead of using indexing (but as you saw you lose the ability to see how many entries are next).

Jesus Ramos
+3  A: 

There are two ways to get the length of "something" on a computer.

The first way is to store a count - this requires anything that touches the file/data to modify it (or a class that only exposes interfaces -- but it boils down to the same thing).

The other way is to iterate over it and count how big it is.

Wayne Werner
+2  A: 

Kinda. You could check the __lenght_hint__ method, but be warned that it's a undocumented implementation detail (following message in thread), that could very well vanish or summon nasal demons instead.

Otherwise, no. Iterators are just an object that only expose the next() method. You can call it as many times as required and they may or may not eventually raise StopIteration. Luckily, this behaviour is most of the time transparent to the coder. :)

badp
A: 

It's common practice to put this type of information in the file header, and for pysam to give you access to this. I don't know the format, but have you checked the API?

As others have said, you can't know the length from the iterator.

tom10
A: 

Regarding your original question, the answer is still that there is no way in general to know the length of an iterator in Python.

Given that you question is motivated by an application of the pysam library, I can give a more specific answer: I'm a contributer to PySAM and the definitive answer is that SAM/BAM files do not provide an exact count of aligned reads. Nor is this information easily available from a BAM index file. The best one can do is to estimate the approximate number of alignments by using the location of the file pointer after reading a number of alignments and extrapolating based on the total size of the file. This is enough to implement a progress bar, but not a method of counting alignments in constant time.

Kevin Jacobs