Is there an efficient way to know how many elements are in an iterator in Python, in general, without iterating through each and counting?
thanks.
Is there an efficient way to know how many elements are in an iterator in Python, in general, without iterating through each and counting?
thanks.
No, any method will require you to resolve every result. You can do iter_length = len(list(iterable))
, but running that on an infinite iterator will of course never return. It also will consume the iterator and it will need to be reset if you want to use the contents.
Telling us what real problem you're trying to solve might help us find you a better way to accomplish your actual goal.
Edit: Using list()
will read the whole iterable into memory at once, which may be undesirable. Another way is to do sum(1 for _ in iterable)
as another person posted. That will avoid keeping it in memory.
This code should work:
>>> iter = (i for i in range(50))
>>> sum(1 for _ in iter)
50
No. It's not possible.
Example:
import random
def gen(n):
for i in xrange(n):
if random.randint(0, 1) == 0:
yield i
iterator = gen(10)
Length of iterator
is unknown until you iterate through it.
An iterator is just an object which has a pointer to the next object to be read by some kind of buffer or stream, it's like a LinkedList where you don't know how many things you have until you iterate through them. Iterators are meant to be efficient because all they do is tell you what is next by references instead of using indexing (but as you saw you lose the ability to see how many entries are next).
There are two ways to get the length of "something" on a computer.
The first way is to store a count - this requires anything that touches the file/data to modify it (or a class that only exposes interfaces -- but it boils down to the same thing).
The other way is to iterate over it and count how big it is.
Kinda. You could check the __lenght_hint__
method, but be warned that it's a undocumented implementation detail (following message in thread), that could very well vanish or summon nasal demons instead.
Otherwise, no. Iterators are just an object that only expose the next()
method. You can call it as many times as required and they may or may not eventually raise StopIteration
. Luckily, this behaviour is most of the time transparent to the coder. :)
It's common practice to put this type of information in the file header, and for pysam to give you access to this. I don't know the format, but have you checked the API?
As others have said, you can't know the length from the iterator.
Regarding your original question, the answer is still that there is no way in general to know the length of an iterator in Python.
Given that you question is motivated by an application of the pysam library, I can give a more specific answer: I'm a contributer to PySAM and the definitive answer is that SAM/BAM files do not provide an exact count of aligned reads. Nor is this information easily available from a BAM index file. The best one can do is to estimate the approximate number of alignments by using the location of the file pointer after reading a number of alignments and extrapolating based on the total size of the file. This is enough to implement a progress bar, but not a method of counting alignments in constant time.