ansaurus

Question

How do I use Python's itertools.groupby()?

Answer 1

+7 A:

Can you show us your code?

The example on the Python docs is quite straight forward:

groups = []
uniquekeys = []
for k, g in groupby(data, keyfunc):
    groups.append(list(g))      # Store group iterator as a list
    uniquekeys.append(k)

So in your case, data is a list of nodes, keyfunc is where the logic of your criteria function goes and then groupby() groups the data. You must be careful to sort the data by the criteria before you call groupby or it won't work. groupby method actually just iterates through a list and whenever the key changes it creates a new group.

Sebastjan Trepča 2008-08-03 18:40:09

Can you remove the chars in your answer, please? TIA!

Torsten Marek 2008-09-28 09:26:34

Answer 2

+34 A:

After some experimentation, I've overcome my mental block. In retrospect, it's all obvious, but in the spirit of Stack Overflow, here's what I learned.

As Sebastjan said, you first have to sort your data. This is important.

The part I didn't get is that in the example construction

groups = []
uniquekeys = []
for k, g in groupby(data, keyfunc):
   groups.append(list(g))    # Store group iterator as a list
   uniquekeys.append(k)

"k" is the current grouping key, and "g" is an iterator that you can use to iterate over the group defined by that grouping key. In other words, the groupby iterator itself returns iterators. Here's an example of that, using clearer variable names:

from itertools import groupby

things = [("animal", "bear"), ("animal", "duck"), ("plant", "cactus"), ("vehicle", "speed boat"), ("vehicle", "school bus")]

for key, group in groupby(things, lambda x: x[0]):
    for thing in group:
        print "A %s is a %s." % (thing[1], key)
    print " "

This will give you the output:

A bear is a animal.
A duck is a animal.

A cactus is a plant.

A speed boat is a vehicle.
A school bus is a vehicle.

In this example, "things" is a list of tuples where the first item in each tuple is the group the second item belongs to. The groupby() function takes two arguments: (1) the data to group and (2) the function to group it with. Here, "lambda x: x[0]" tells groupby() to use the first item in each tuple as the grouping key.

In the above "for" statement, groupby returns three (key, group iterator) pairs - once for each unique key. You can use the returned iterator to iterate over each individual item in that group.

Here's a slightly different example with the same data, using a list comprehension:

for key, group in groupby(things, lambda x: x[0]):
    listOfThings = " and ".join(["%s" % thing[1] for thing in group])
    print key + "s:  " + listOfThings + "."

This will give you the output:

animals: bear and duck.
plants: cactus.
vehicles: speed boat and school bus.

Python's pretty cool, no?

James Sulak 2008-08-10 18:45:32

Thanks for that example. I must be dumb, but unlike Sebastjan Trepča, I didn't think the doc example was so "straight forward".

e-satis 2009-10-27 14:18:07

I agree, I didn't think the doc was straight forward either.

Neil Kodner 2010-06-25 20:22:59

Answer 3

+2 A:

A neato trick with groupby is run length encoding in one line:

[(c,len(list(cgen))) for c,cs in groupby(some_string)]

will give you a list of 2-tuples where the first element is the char and the 2nd is the number of repetitions.

nt 2008-08-31 23:27:16

Answer 4

+2 A:

A correction to answer by @nt:

[(c,len(list(cs))) for c,cs in groupby(some_string)]

Thanks, the list(cs) part was what I was missing when trying to make use of the _grouper() objects returned by groupby(). This example can be an elegant solution for generating Morris sequence.

CaptSolo 2008-09-28 01:05:12

You could have answer that in a comment.

e-satis 2009-10-27 14:18:43

Answer 5

A:

@CaptSolo, I tried your example, but it didn't work.

from itertools import groupby [(c,len(list(cs))) for c,cs in groupby('Pedro Manoel')]

Output:

[('P', 1), ('e', 1), ('d', 1), ('r', 1), ('o', 1), (' ', 1), ('M', 1), ('a', 1), ('n', 1), ('o', 1), ('e', 1), ('l', 1)]

As you can see, there are two o's and two e's, but they got into separate groups. That's when I realized you need to sort the list passed to the groupby function. So, the correct usage would be:

name = list('Pedro Manoel') name.sort() [(c,len(list(cs))) for c,cs in groupby(name)]

Output:

[(' ', 1), ('M', 1), ('P', 1), ('a', 1), ('d', 1), ('e', 2), ('l', 1), ('n', 1), ('o', 2), ('r', 1)]

Just remembering, if the list is not sorted, the groupby funcion will not work!

pedromanoel 2009-10-15 15:41:51

Actually it works. You might think this behavior as broken, but it's useful in some cases. See answers to this question for an example: http://stackoverflow.com/questions/1553275/how-to-strip-a-list-of-tuple-with-python

Denis Otkidach 2009-10-15 16:29:52

Answer 6

A:

I have same problem.

I want to group by on dict key

>>> x
[{'a': 10, 'b': 90}, {'a': 20}, {'a': 30}, {'a': 10}]
>>> [(name, list(group)) for name, group in groupby(x, lambda p:p['a'])]
[(10, [{'a': 10, 'b': 90}]), (20, [{'a': 20}]), (30, [{'a': 30}]), (10, [{'a': 10}])]

This must group on key 10 :(

Nil 2010-04-07 10:27:20

ansaurus

tags:

views:

answers:

How do I use Python's itertools.groupby()?

related questions