views:

207

answers:

3

Hello there,

not sure this was asked before, but I couldn't find an obvious answer. I'm trying to count the number of elements in a list that are equal to a certain value. The problem is that these elements are not of a built-in type. So if I have

class A:
    def __init__(self, a, b):
        self.a = a
        self.b = b

stuff = []
for i in range(1,10):
    stuff.append(A(i/2, i%2))

Now I would like a count of the list elements whose field b = 1. I came up with two solutions:

print [e.b for e in stuff].count(1)

and

print len([e for e in stuff if e.b == 1])

Which is the best method? Is there a better alternative? It seems that the count() method does not accept keys (at least in Python version 2.5.1.

Many thanks!

+3  A: 
print sum(1 for e in L if e.b == 1)
Roger Pate
Nice one, I think this is more readable version of Alex Martelli's answer, summing 1 is more obvious than knowing that True can be treated as 1.
Tendayi Mawushe
It also lends itself well as a common pattern: `sum(len(n) for n in L if n.b == 1)` for example.
Roger Pate
A: 

I would prefer the second one as it's only looping over the list once.

If you use count() you're looping over the list once to get the b values, and then looping over it again to see how many of them equal 1.

A neat way may to use reduce():

reduce(lambda x,y: x + (1 if y.b == 1 else 0),list,0)

The documentation tells us that reduce() will:

Apply function of two arguments cumulatively to the items of iterable, from left to right, so as to reduce the iterable to a single value.

So we define a lambda that adds one the accumulated value only if the list item's b attribute is 1.

Dave Webb
+8  A: 
sum(x.b == 1 for x in L)

A boolean (as resulting from comparisons such as x.b == 1) is also an int, with a value of 0 for False, 1 for True, so arithmetic such as summation works just fine.

This is the simplest code, but perhaps not the speediest (only timeit can tell you for sure;-). Consider (simplified case to fit well on command lines, but equivalent):

$ py26 -mtimeit -s'L=[1,2,1,3,1]*100' 'len([x for x in L if x==1])'
10000 loops, best of 3: 56.6 usec per loop
$ py26 -mtimeit -s'L=[1,2,1,3,1]*100' 'sum(x==1 for x in L)'
10000 loops, best of 3: 87.7 usec per loop

So, for this case, the "memory wasteful" approach of generating an extra temporary list and checking its length is actually solidly faster than the simpler, shorter, memory-thrifty one I tend to prefer. Other mixes of list values, Python implementations, availability of memory to "invest" in this speedup, etc, can affect the exact performance, of course.

Alex Martelli
Might be worth explaining how this works. It won't be obvious to everyone that you can add up a list of booleans.
Dave Webb
Also, why is this a better approach than: len([e for e in list if e.b == 1])which does not have to sum up elements?
nicolaum
Without taking a stance on which is better exactly, this avoids forming a whole list that is not actually needed for anything.
Mike Graham
Aha! That explanation did it for me, many thanks Mike. I initially understood this was building a list and calling the sum() method for lists. It all makes sense now.
nicolaum
@nicolaum and @Dave, I've added detailed explanation and timing -- which actually shows the "useless list" approach being faster than the simple one (at least in one sample case). Simplest is not always fastest: sometimes if you have memory that's free and unused anyway "investing" it can be a tradeoff saving you some time.
Alex Martelli
Interesting with the timing. I guess since the loops "knows" its length it makes it quicker. sum() I think wraps around reduce() so I guess all those indirect function calls make it slow.
Dave Webb
@Dave, I'm the one who originally coded `sum` in the CPython interpreter, and I assure you it never even gets _close_ to `reduce` (not sure where you got the idea).
Alex Martelli
While this is very clever and the shortest code of all the examples, I find it the least pythonic and obvious. `len([x for x in L if cond])` is verbose and includes some redundancy, but it's immediately obvious what it means.
Ben James
@Ben, I think I know something about "Pythonic" -- and anybody who doesn't know that bools can be summed (only way the `sum` can be "unobvious"!-) is enough of a newbie to not be really qualified. Kind of like the sadly-frequent (for newbies) idiom `if x: return True // else: return False` (vs the obvious `return bool(x)`), to draw a parallel, there's nothing "clever" about using booleans properly, for what they are -- it's just the simplest, obvious way to do it (as the "waste-a-list" approach is _faster_, there is of course a case for it!, but **not** based on **simplicity**!-)
Alex Martelli
@Alex, I disgree that there is a parallel there, that totally redundant use of booleans is obviously pointless and not limited to python at all
Ben James
@Ben, neither is summing bools limited to Python -- in C, too, the right way to count how many items are > 5 is `total += (*pitm++) > 5`, **not** `if ((*pitm++)>5) total += 1` (directly use the fact that comparisons return 0 and 1, don't run around the block to try and avoid this simple fact!-).
Alex Martelli
@Alex, What I think is "unobvious" in your approach, is not that booleans can be summed (you're right, that's pretty obvious in most languages). What's "unobvious" (to me at least) is why you do `sum(x.b == 1 for x in L)`, instead of `sum([x.b == 1 for x in L])`. The answer (after searching the documentation) is that sum() expects an iterable object. I just assumed that sum expects a tuple or a list, as all the examples of its usage use them. Other than that, your solution is pretty obvious.
nicolaum
@Dave and @Alex, I think Dave's comment about `reduce` can be explained by an excerpt of the documentation: *[Note that `sum(range(n), m)` is equivalent to `reduce(operator.add, range(n), m)`.]*
nicolaum
@nicolaum, I'm sure there are a lot of lists being uselessly built (esp. in Python 2) where iterables would suffice (Python 3 helps by making implicit list constructions much rarer -- e.g. `range` isn't a list any more, neither is `adict.items`, and so on) -- partly because lists are an older concept, partly because some people don't find iterables obvious... and partly due to `len`, which _doesn't_ accept iterables (can't, because it's meant to be O(1), while iteration's intrinsically O(N) -- and that distinction's somewhat subtle). And yeah, that phrase in the docs is true but silly;-).
Alex Martelli