views:

522

answers:

5

For example, files, in Python, are iterable - they iterate over the lines in the file. I want to count the number of lines.

One quick way is to do this:

lines = len(list(open(fname)))

However, this loads the whole file into memory (at once). This rather defeats the purpose of an iterator (which only needs to keep the current line in memory).

This doesn't work:

lines = len(line for line in open(fname))

as generators don't have a length.

Is there any way to do this short of defining a count function?

def count(i):
    c = 0
    for el in i: c += 1
    return c

EDIT: To clarify, I understand that the whole file will have to be read! I just don't want it in memory all at once =).

+9  A: 

If you need a count of lines you can do this, I don't know of any better way to do it:

line_count = sum(1 for line in open("yourfile.txt"))
mcrute
+15  A: 

Short of iterating through the iterable and counting the number of iterations, no. That's what makes it an iterable and not a list. This isn't really even a python-specific problem. Look at the classic linked-list data structure. Finding the length is an O(n) operation that involves iterating the whole list to find the number of elements.

As mcrute mentioned above, you can probably reduce your function to:

def count_iterable(i):
    return sum(1 for e in i)

Of course, if you're defining your own iterable object you can always implement __len__ yourself and keep an element count somewhere.

Kamil Kisiel
this could be improved with an itertools.tee()
hop
+1  A: 

We'll, if you think about it, how do you propose you find the number of lines in a file without reading the whole file for newlines? Sure, you can find the size of the file, and if you can gurantee that the length of a line is x, you can get the number of lines in a file. But unless you have some kind of constraint, I fail to see how this can work at all. Also, since iterables can be infinitely long...

Nikron
i do want to read the whole file, i just don't want it in memory all at once
Claudiu
+5  A: 

Absolutely not, for the simple reason that iterables are not guaranteed to be finite.

Consider this perfectly legal generator function:

def forever():
    while True:
        yield "I will run forever"

Attempting to calculate the length of this function with len([x for x in forever()]) will clearly not work.

As you noted, much of the purpose of iterators/generators is to be able to work on a large dataset without loading it all into memory. The fact that you can't get an immediate length should be considered a tradeoff.

Triptych
who downvoted this? it's valid and to the point!
hasen j
It's also true for sum(), max() and min() but this aggregate functions take iterables.
Tim
i downvoted this, mainly for the "absolutely," which is just not true. anything that implements __len__() has a length -- infinite, or not.
hop
@hop, the question is about iterables in the general case. iterables that implement __len__ are a special case.
Triptych
plus, infinite iterables that implement a __len__ that returns 5 don't count!
hasen j
@Triptych Yes, but as hop says, starting with "absolutely" implies universal applicability, including all special cases.
Alabaster Codify
+4  A: 

I've used this redefinition for some time now:

def len(thingy):
    try:
        return thingy.__len__()
    except AttributeError:
        return sum(1 for item in iter(thingy))
Tim
It can never returns... See Triptych's example.
bortzmeyer
Yep, use with care
Tim