ansaurus

Question

Sorting and Grouping Nested Lists in Python

Answer 1

+2 A:

If I understand your question correctly, the following code should do the job:

l = [
 ['4', '21', '1', '14', '2008-10-24 15:42:58'], 
 ['3', '22', '4', '2somename', '2008-10-24 15:22:03'], 
 ['5', '21', '3', '19', '2008-10-24 15:45:45'], 
 ['6', '21', '1', '1somename', '2008-10-24 15:45:49'], 
 ['7', '22', '3', '2somename', '2008-10-24 15:45:51']
]

def compareField(field):
   def c(l1,l2):
      return cmp(l1[field], l2[field])
   return c

# Use compareField(1) as the ordering criterion, i.e. sort only with
# respect to the 2nd field
l.sort(compareField(1))
for row in l: print row

print
# Select only those sublists for which 4th field=='2somename'
l2somename = [row for row in l if row[3]=='2somename']
for row in l2somename: print row

Output:

['4', '21', '1', '14', '2008-10-24 15:42:58']
['5', '21', '3', '19', '2008-10-24 15:45:45']
['6', '21', '1', '1somename', '2008-10-24 15:45:49']
['3', '22', '4', '2somename', '2008-10-24 15:22:03']
['7', '22', '3', '2somename', '2008-10-24 15:45:51']

['3', '22', '4', '2somename', '2008-10-24 15:22:03']
['7', '22', '3', '2somename', '2008-10-24 15:45:51']

Federico Ramponi 2009-01-03 17:17:32

The 'cmp' argument to sort is being removed in 2.6/3.0, thus, it is preferable to use the 'key' parameter which extracts a sort key, but otherwise, +1.

Aaron Maenpaa 2009-01-03 17:29:18

removed 'cmp=', should be the first argument anyway. By the way, I'm using python 2.6.1 and all works fine...

Federico Ramponi 2009-01-03 17:36:49

Answer 2

+3 A:

If you assigned it to var "a"...

#1:

a.sort(lambda x,y: cmp(x[1], y[1]))

#2:

filter(lambda x: x[3]=="2somename", a)

Jimmy2Times 2009-01-03 17:26:37

Answer 3

+1 A:

Use a function to reorder the list so that I can group by each item in the list. For example I'd like to be able to group by the second column (so that all the 21's are together)

Lists have a built in sort method and you can provide a function that extracts the sort key.

>>> import pprint
>>> l.sort(key = lambda ll: ll[1])
>>> pprint.pprint(l)
[['4', '21', '1', '14', '2008-10-24 15:42:58'],
 ['5', '21', '3', '19', '2008-10-24 15:45:45'],
 ['6', '21', '1', '1somename', '2008-10-24 15:45:49'],
 ['3', '22', '4', '2somename', '2008-10-24 15:22:03'],
 ['7', '22', '3', '2somename', '2008-10-24 15:45:51']]

Use a function to only display certain values from each inner list. For example i'd like to reduce this list to only contain the 4th field value of '2somename'

This looks like a job for list comprehensions

>>> [ll[3] for ll in l]
['14', '2somename', '19', '1somename', '2somename']

Aaron Maenpaa 2009-01-03 17:27:29

Replace `[ll[3] for ll in l]` by `[ll for ll in l if ll[3] == '2somename']` and fix the output.

J.F. Sebastian 2009-01-03 18:39:05

Answer 4

+7 A:

For the first question, the first thing you should do is sort the list by the second field:

x = [
 ['4', '21', '1', '14', '2008-10-24 15:42:58'], 
 ['3', '22', '4', '2somename', '2008-10-24 15:22:03'], 
 ['5', '21', '3', '19', '2008-10-24 15:45:45'], 
 ['6', '21', '1', '1somename', '2008-10-24 15:45:49'], 
 ['7', '22', '3', '2somename', '2008-10-24 15:45:51']
]

from operator import itemgetter

x.sort(key=itemgetter(1))

Then you can use itertools' groupby function:

from itertools import groupby
y = groupby(x, itemgetter(1))

Now y is an iterator containing tuples of (element, item iterator). It's more confusing to explain these tuples than it is to show code:

for elt, items in groupby(x, itemgetter(1)):
    print elt, items
    for i in items:
        print i

Which prints:

21 <itertools._grouper object at 0x511a0>
['4', '21', '1', '14', '2008-10-24 15:42:58']
['5', '21', '3', '19', '2008-10-24 15:45:45']
['6', '21', '1', '1somename', '2008-10-24 15:45:49']
22 <itertools._grouper object at 0x51170>
['3', '22', '4', '2somename', '2008-10-24 15:22:03']
['7', '22', '3', '2somename', '2008-10-24 15:45:51']

For the second part, you should use list comprehensions as mentioned already here:

from pprint import pprint as pp
pp([y for y in x if y[3] == '2somename'])

Which prints:

[['3', '22', '4', '2somename', '2008-10-24 15:22:03'],
 ['7', '22', '3', '2somename', '2008-10-24 15:45:51']]

llimllib 2009-01-03 17:29:07

I've added the list comprehension example.

J.F. Sebastian 2009-01-03 18:34:11

Answer 5

+1 A:

If you'll be doing a lot of sorting and filtering, you may like some helper functions.

m = [
 ['4', '21', '1', '14', '2008-10-24 15:42:58'], 
 ['3', '22', '4', '2somename', '2008-10-24 15:22:03'], 
 ['5', '21', '3', '19', '2008-10-24 15:45:45'], 
 ['6', '21', '1', '1somename', '2008-10-24 15:45:49'], 
 ['7', '22', '3', '2somename', '2008-10-24 15:45:51']
]

# Sort and filter helpers.
sort_on   = lambda pos:     lambda x: x[pos]
filter_on = lambda pos,val: lambda l: l[pos] == val

# Sort by second column
m = sorted(m, key=sort_on(1))

# Filter on 4th column, where value = '2somename'
m = filter(filter_on(3,'2somename'),m)

Triptych 2009-01-03 17:37:59

sort_on == operator.itemgetter

J.F. Sebastian 2009-01-03 18:41:02

Please use DEF's instead of lambdas.

S.Lott 2009-01-03 19:33:39

@ s.lott - why defs over lambdas here?

Triptych 2009-01-03 23:56:54

@Triptych: because lambdas with a name are just like defs but more confusing for absolutely no benefit.

nosklo 2009-01-05 15:01:54

Meh. In this case, I think lambda's are more readable. And "more confusing" is certainly subjective!

Triptych 2009-01-05 15:17:26

Answer 6

A:

It looks a lot like you're trying to use a list as a database.

Nowadays Python includes sqlite bindings in the core distribution. If you don't need persistence, it's really easy to create an in-memory sqlite database (see http://stackoverflow.com/questions/304393/how-do-i-create-a-sqllite3-in-memory-database).

Then you can use SQL statements to do all this sorting and filtering without having to reinvent the wheel.

Kamil Kisiel 2009-01-03 17:56:59

Kamil, you are correct. However I am learning Python and wanted to do things using lists so that I can learn some more about them.I will check this out though thanks

m3clov3n 2009-01-03 19:16:37

Answer 7

+1 A:

For part (2), with x being your array, I think you want,

[y for y in x if y[3] == '2somename']

Which will return a list of just your data lists that have a fourth value being '2somename'... Although it seems Kamil is giving you the best advice with going for SQL...

2009-01-03 18:15:03

Answer 8

+1 A:

You're simply creating indexes on your structure, right?

>>> from collections import defaultdict
>>> def indexOn( things, pos ):
...     inx= defaultdict(list)
...     for t in things:
...             inx[t[pos]].append(t)
...     return inx
... 
>>> a=[
...  ['4', '21', '1', '14', '2008-10-24 15:42:58'], 
...  ['3', '22', '4', '2somename', '2008-10-24 15:22:03'], 
...  ['5', '21', '3', '19', '2008-10-24 15:45:45'], 
...  ['6', '21', '1', '1somename', '2008-10-24 15:45:49'], 
...  ['7', '22', '3', '2somename', '2008-10-24 15:45:51']
... ]

Here's your first request, grouped by position 1.

>>> import pprint
>>> pprint.pprint( dict(indexOn(a,1)) )
{'21': [['4', '21', '1', '14', '2008-10-24 15:42:58'],
        ['5', '21', '3', '19', '2008-10-24 15:45:45'],
        ['6', '21', '1', '1somename', '2008-10-24 15:45:49']],
 '22': [['3', '22', '4', '2somename', '2008-10-24 15:22:03'],
        ['7', '22', '3', '2somename', '2008-10-24 15:45:51']]}

Here's your second request, grouped by position 3.

>>> dict(indexOn(a,3))
{'19': [['5', '21', '3', '19', '2008-10-24 15:45:45']], '14': [['4', '21', '1', '14', '2008-10-24 15:42:58']], '2somename': [['3', '22', '4', '2somename', '2008-10-24 15:22:03'], ['7', '22', '3', '2somename', '2008-10-24 15:45:51']], '1somename': [['6', '21', '1', '1somename', '2008-10-24 15:45:49']]}
>>> pprint.pprint(_)
{'14': [['4', '21', '1', '14', '2008-10-24 15:42:58']],
 '19': [['5', '21', '3', '19', '2008-10-24 15:45:45']],
 '1somename': [['6', '21', '1', '1somename', '2008-10-24 15:45:49']],
 '2somename': [['3', '22', '4', '2somename', '2008-10-24 15:22:03'],
               ['7', '22', '3', '2somename', '2008-10-24 15:45:51']]}

S.Lott 2009-01-03 19:39:45

ansaurus

tags:

views:

answers:

Sorting and Grouping Nested Lists in Python

related questions