views:

2864

answers:

4

Hello,

I have a list of data that looks like the following:

// timestep,x_position,y_position
0,4,7
0,2,7
0,9,5
0,6,7
1,2,5
1,4,7
1,9,0
1,6,8

... and I want to make this look like:

0, (4,7), (2,7), (9,5), (6,7)
1, (2,5), (4,7), (9,0), (6.8)

My plan was to use a dictionary, where the value of t is the key for the dictionary, and the value against the key would be a list. I could then append each (x,y) to the list. Something like:

# where t = 0, c = (4,7), d = {}

# code 1
d[t].append(c)

Now this causes IDLE to fail. However, if I do:

# code 2
d[t] = []
d[t].append(c)

... this works.

So the question is: why does code 2 work, but code 1 doesn't?

PS Any improvement on what I'm planning on doing would be of great interest!! I think I will have to check the dictionary on each loop through the input to see if the dictionary key already exists, I guess by using something like max(d.keys()): if it is there, append data, if not create the empty list as the dictionary value, and then append data on the next loop through.

+14  A: 

Let's look at

d[t].append(c)

What is the value of d[t]? Try it.

d = {}
t = 0
d[t]

What do you get? Oh. There's nothing in d that has a key of t.

Now try this.

d[t] = []
d[t]

Ahh. Now there's something in d with a key of t.

There are several things you can do.

  1. Use example 2.
  2. Use setdefault. d.setdefault(t,[]).append(c).
  3. Use collections.defaultdict. You'd use a defaultdict(list) instead of a simple dictionary, {}.


Edit 1. Optimization

Given input lines from a file in the above form: ts, x, y, the grouping process is needless. There's no reason to go from a simple list of ( ts, x, y ) to a more complex list of ( ts, (x,y), (x,y), (x,y), ... ). The original list can be processed exactly as it arrived.

d= collections.defaultdict(list)
for ts, x, y in someFileOrListOrQueryOrWhatever:
    d[ts].append( (x,y) )


Edit 2. Answer Question

"when initialising a dictionary, you need to tell the dictionary what the key-value data structure will look like?"

I'm not sure what the question means. Since, all dictionaries are key-value structures, the question's not very clear. So, I'll review the three alternatives, which may answer the question.

Example 2.

Initialization

d= {}

Use

if t not in d:
    d[t] = list()
d[t].append( c )

Each dictionary value must be initialized to some useful structure. In this case, we check to see if the key is present; when the key is missing, we create the key and assign an empty list.

Setdefault

Initialization

d= {}

Use

d.setdefault(t,list()).append( c )

In this case, we exploit the setdefault method to either fetch a value associated with a key or create a new value associated with a missing key.

default dict

Initialization

import collections
d = collections.defaultdict(list)

Use

d[t].append( c )

The defaultdict uses an initializer function for missing keys. In this case, we provide the list function so that a new, empty list is created for a missing key.

S.Lott
So does this mean that when initialising a dictionary, you need to tell the dictionary what the key-value data structure will look like?Sorry, coming from a Perl background which I have not used in anger in years, so may be going on broken memories, as was sure you could do this anonymously.
A: 
dict=[]  //it's not a dict, it's a list, the dictionary is dict={}
elem=[1,2,3]
dict.append(elem)

you can access the single element in this way:

print dict[0] // 0 is the index

the output will be:

[1, 2, 3]
Giancarlo
A: 

I think you want to use setdefault. It's a bit weird to use but does exactly what you need.

d.setdefault(t, []).append(c)

The .setdefault method will return the element (in our case, a list) that's bound to the dict's key t if that key exists. If it doesn't, it will bind an empty list to the key t and return it. So either way, a list will be there that the .append method can then append the tuple c to.

Tim Pietzcker
A: 

In the case your data is not already sorted by desired criteria, here's the code that might help to group the data:

#!/usr/bin/env python
"""
$ cat data_shuffled.txt
0,2,7
1,4,7
0,4,7
1,9,0
1,2,5
0,6,7
1,6,8
0,9,5
"""
from itertools   import groupby
from operator    import itemgetter

# load the data and make sure it is sorted by the first column
sortby_key = itemgetter(0)
data = sorted((map(int, line.split(',')) for line in open('data_shuffled.txt')),
              key=sortby_key)

# group by the first column
grouped_data = []
for key, group in groupby(data, key=sortby_key):
    assert key == len(grouped_data) # assume the first column is 0,1, ...
    grouped_data.append([trio[1:] for trio in group])

# print the data
for i, pairs in enumerate(grouped_data):
    print i, pairs

Output:

0 [[2, 7], [4, 7], [6, 7], [9, 5]]
1 [[4, 7], [9, 0], [2, 5], [6, 8]]
J.F. Sebastian