views:

989

answers:

4

there are many ways, how to code histogram in Python.

by histogram, i mean function, counting objects in an interable, resulting in the count table (i.e. dict). e.g.:

>>> L = 'abracadabra'
>>> histogram(L)
{'a': 5, 'b': 2, 'c': 1, 'd': 1, 'r': 2}

it can be written like this:

def histogram(L):
    d = {}
    for x in L:
        if x in d:
            d[x] += 1
        else:
            d[x] = 1
    return d

..however, there are much less ways, how do this in a single expression.

if we had "dict comprehensions" in python, we would write:

>>> { x: L.count(x) for x in set(L) }

but we don't have them, so we have to write:

>>> dict([(x, L.count(x)) for x in set(L)])

however, this approach may yet be readable, but is not efficient - L is walked-through multiple times, so this won't work for single-life generators.. the function should iterate well also through gen(L), where:

def gen(L):
    for x in L:
        yield x

we can go with reduce (R.I.P.):

>>> reduce(lambda d,x: dict(d, x=d.get(x,0)+1), L, {}) # wrong!

oops, does not work, the key name is 'x', not x :(

i ended with:

>>> reduce(lambda d,x: dict(d.items() + [(x, d.get(x, 0)+1)]), L, {})

(in py3k, we would have to write list(d.items()) instead of d.items(), but it's hypothethical, since there is no reduce there)

please beat me with a better one-liner, more readable! ;)

+13  A: 

Python 3.x does have reduce, you just have to do a from functools import reduce. It also has "dict comprehensions", which have exactly the syntax in your example.

Python 2.7 and 3.x also have a Counter class which does exactly what you want:

from collections import Counter
cnt = Counter("abracadabra")

In Python 2.6 or earlier, I'd personally use a defaultdict and do it in 2 lines:

d = defaultdict(int)
for x in xs: d[x] += 1

That's clean, efficient, Pythonic, and much easier for most people to understand than anything involving reduce.

Eli Courtwright
Python 2.7 also has dict comprehensions.
chpwn
+1  A: 

For a while there, anything using itertools was by definition Pythonic. Still, this is a bit on the opaque side:

>>> from itertools import groupby
>>> grouplen = lambda grp : sum(1 for i in grp)
>>> hist = dict((a[0], grouplen(a[1])) for a in groupby(sorted("ABRACADABRA")))
>>> print hist
{'A': 5, 'R': 2, 'C': 1, 'B': 2, 'D': 1}

I'm currently running Python 2.5.4.

Paul McGuire
This solution is O(n log n). There are several simpler linear solutions provided here.
Mike Graham
@Mike - are you sure? Beware of lurking complexities. Iterating over the list is obviously O(n), but what is the complexity of the repeated looking up of each key in the summarizing dict? It's not O(1).
Paul McGuire
@Paul, Looking up dict keys is O(1).
Mike Graham
So it is! Good tip!
Paul McGuire
This solution (without the sorted call, of course) is ok when your iterable is already sorted, otherwise it's too expensive, as Mike stated.
tokland
+1  A: 

It's kinda cheaty to import modules for oneliners, so here's a oneliner that is O(n) and works at least as far back as Python2.4

>>> f=lambda s,d={}:([d.__setitem__(i,d.get(i,0)+1) for i in s],d)[-1]
>>> f("ABRACADABRA")
{'A': 5, 'R': 2, 'B': 2, 'C': 1, 'D': 1}

And if you think __ methods are hacky, you can always do this

>>> f=lambda s,d=lambda:0:vars(([setattr(d,i,getattr(d,i,0)+1) for i in s],d)[-1])
>>> f("ABRACADABRA")
{'A': 5, 'R': 2, 'B': 2, 'C': 1, 'D': 1}

:)

gnibbler
cool, i've never seen default arguments in lambda before..
mykhal
+1  A: 

Your on-liner using reduce was almost ok, you only needed to tweak it a little:

>>> reduce(lambda d,x: dict(d, **{x: d.get(x,0)+1}), L, {}). 
{'a': 5, 'b': 2, 'c': 1, 'd': 1, 'r': 2}

Of course, this won't beat in-place solutions (nor in speed, nor in pythonicity), but in exchange you've got yourself a nice purely functional snippet. BTW, this would be somewhat prettier if Python had a method dict.merge().

tokland