ansaurus

Question

Can Python's list comprehensions (ideally) do the equivalent of 'count(*)...group by...' in SQL?

Answer 1

+5 A:

Not easily doable as a list comprehension.

from collections import defaultdict
def group_by( someList ):
    counts = defaultdict(int)
    for value in someList:
        counts[value.aKey] += 1
    return counts

This is a very Pythonic solution. But not a list comprehension.

S.Lott 2010-01-27 16:23:21

why the downvote?

Adam Bernier 2010-01-27 16:25:43

This way works, even if it's not a list comprehension at all.

Broam 2010-01-27 16:25:45

Very 'awk' like solution as it goes as well!Small problem when running it on my system:...counts = defaultdict(int)NameError: global name 'defaultdict' is not defined...( Python 2.6.2 (r262:71605, Apr 14 2009, 22:40:02) [MSC v.1500 32 bit (Intel)] onwin32 )

monojohnny 2010-01-27 16:27:45

@monojohnny: `>>> from collections import defaultdict`

SilentGhost 2010-01-27 16:29:06

@Adam: Poster asked for list comprehension method, as you provided. However, even without that, the most "Pythonic" solution would be to use the built-in Counter class rather than rolling out your own.

BlueRaja - Danny Pflughoeft 2010-01-27 16:32:10

@BlueRaja: Counter is not available in py2.6, and OP is clearly misunderstands the list comprehensions. Because accepted solution has a piss-poor performance.

SilentGhost 2010-01-27 16:35:24

@SilentGhost; thanks I should have worked that out really !! Doh!@BlueRaja - I have clarified the title - I was aiming for List Comprehension (cos they look nice), but was accepting other ways as well.

monojohnny 2010-01-27 16:35:59

@BlueRaja: I agree; the built-in solution is always preferable.

Adam Bernier 2010-01-27 16:40:05

@SilentGhosht: re:"OP is clearly misunderstands the list comprehensions" - in fact as far OP is concerned I think they'd be better being referred to as 'list-UNcomprehensions'..yuk yuk :)

monojohnny 2010-01-27 18:59:16

@monojohnny: they come directly from the set theory http://en.wikipedia.org/wiki/List_comprehension#Overview

SilentGhost 2010-01-27 19:30:22

Yeah, understood - was just trying a bit of self-deprecating humour :-)Actually (I commented above on this also) - 'piss poor performance' might be a bit strong : I believe (somebody correct me if I'm wrong), that the accepted answer is 2*n as opposed to the optimum 1*n - not such a big deal: especially since I read that list-comps are faster in general than python loops.

monojohnny 2010-01-28 09:55:56

Answer 2

+6 A:

Lennart Regebro provided a nice one-liner that does what you want:

>>> values = [1,1,1,2]
>>> print [(x,values.count(x)) for x in set(values)]
[(1, 3), (2, 1)]

As S.Lott mentions, a defaultdict can do the same thing.

Adam Bernier 2010-01-27 16:25:17

nice...cheers - works fine on my 2.6 install.

monojohnny 2010-01-27 16:30:14

The defaultdict solution will do a lot better on larger `values` lists, because it will only loop the values once, instead of once plus once for each unique value.

Thomas Wouters 2010-01-27 16:42:35

@Thomas: thanks for adding that.

Adam Bernier 2010-01-27 16:48:31

Actually (although very elegant code: so the tick-stays) , this is a little more tricky for instances of my user-defined objects, as it appears I have to define a '__hash__' method that returns an 'int' , which (in my particular case) is non-trivial (its a big lump of text) : I might try an MD5 checksum or something(suggestions?), but for now, I think I'll go with the (not-unelegant) 'def group_by(someList)' given by S.Lott.

monojohnny 2010-01-27 16:53:20

@Thomas - understood, good point; in my case I have a few hundred items at most.

monojohnny 2010-01-27 17:07:05

A few hundred is quite a few :) A few hundred loops through a few hundred items will take a few hundred times longer than one loop through a few hundred items.

Thomas Wouters 2010-01-27 22:18:11

I appreciate everybody's concern about performance - but I'm not designing enterprise software here - I'm running a text filter against a few meg of logfiles.At is happens , it runs just great for my needs. I might swap out the algorithm at some point, if the script has more than a one-off use.In fact, the script completes in about a second, including the additional overhead of text-processing to generate my objects and to format the final report.I believe off the top of my head, that the algorithm is 2*n, as opposed to 1*n: so it should scale in a linear fashion as it happens.

monojohnny 2010-01-28 09:50:59

Answer 3

+8 A:

>>> from collections import Counter
>>> Counter([1,1,1,2])
Counter({1: 3, 2: 1})

Counter only available in py3.1, inherits from the dict.

SilentGhost 2010-01-27 16:28:14

Why the downvote?

SilentGhost 2010-01-27 16:30:27

+1 for the easiest way to do this in 3.x

Adam Bernier 2010-01-27 16:30:58

Answer 4

+4 A:

You can use groupby from the itertools module:

Make an iterator that returns consecutive keys and groups from the iterable. The key is a function computing a key value for each element. If not specified or is None, key defaults to an identity function and returns the element unchanged. Generally, the iterable needs to already be sorted on the same key function.

>>> a = [1,1,1,2]
>>> [(len(list(v)), key) for (key, v) in itertools.groupby(sorted(a))]
[(3, 1), (1, 2)]

I would assume its runtime is worse than the dict-based solutions by SilentGhost or S.Lott since it has to sort the input sequence, but you should time that yourself. It is a list comprehension, though. It should be faster than Adam Bernier's solution, since it doesn't have to do repeated linear scans of the input sequence. If needed, the sorted call can be avoided by sorting the input sequence in-line.

Torsten Marek 2010-01-27 16:33:54

+1 for the effort, and I agree that either S.Lott or SilentGhost's solutions are superior.

Adam Bernier 2010-01-27 16:41:39

Don't agree before you've profiled them;)

Torsten Marek 2010-01-27 16:57:10

@Torsten: good point :-) To clarify, I agree based on the fact that those are built-in solutions; already tested and debugged for us.

Adam Bernier 2010-01-27 17:15:32

@Adam: Sure, for 3.1, Counter is the way to go.

Torsten Marek 2010-01-27 17:58:03

ansaurus

tags:

views:

answers:

Can Python's list comprehensions (ideally) do the equivalent of 'count(*)...group by...' in SQL?

related questions