views:

481

answers:

6

I've got a list of tuples extracted from a table in a DB which looks like (key , foreignkey , value). There is a many to one relationship between the key and foreignkeys and I'd like to convert it into a dict indexed by the foreignkey containing the sum of all values with that foreignkey, i.e. { foreignkey , sumof( value ) }. I wrote something that's rather verbose:

myDict = {}
for item in myTupleList:
    if item[1] in myDict:
        myDict [ item[1] ] += item[2]
    else:
        myDict [ item[1] ] = item[2]

but after seeing this question's answer or these two there's got to be a more concise way of expressing what I'd like to do. And if this is a repeat, I missed it and will remove the question if you can provide the link.

+5  A: 

Assuming all your values are ints, you could use a defaultdict to make this easier:

from collections import defaultdict

myDict = defaultdict(int)

for item in myTupleList:
    myDict[item[1]] += item[2]

defaultdict is like a dictionary, except if you try to get a key that isn't there it fills in the value returned by the callable - in this case, int, which returns 0 when called with no arguments.

UPDATE: Thanks to @gnibbler for reminding me, but tuples can be unpacked in a for loop:

from collections import defaultdict

myDict = defaultdict(int)

for _, key, val in myTupleList:
    myDict[key] += val

Here, the 3-item tuple gets unpacked into the variables _, key, and val. _ is a common placeholder name in Python, used to indicate that the value isn't really important. Using this, we can avoid the hairy item[1] and item[2] indexing. We can't rely on this if the tuples in myTupleList aren't all the same size, but I bet they are.

(We also avoid the situation of someone looking at the code and thinking it's broken because the writer thought arrays were 1-indexed, which is what I thought when I first read the code. I wasn't alleviated of this until I read the question. In the above loop, however, it's obvious that myTupleList is a tuple of three elements, and we just don't need the first one.)

Chris Lutz
Which, BTW, was a suggestion in one of the SO posting cited in the question...
mjv
Thanks. That's very easy to read.
wheaties
`myDict = reduce(lambda d, t: (d[t[1]] += t[2], d)[1], myTupleList, defaultdict(int))` - You know, everybody should use reduce! *big grin*
Omnifarious
Omnifarious, can you post that as an answer and explain it? That looks very interesting. I'm looking for anything that can teach me something new, especially a functionally flavored answer.
wheaties
@wheaties, alright, since you asked. *laugh* I prefer Chris' solution much more though. In this case `reduce` obscures. And my solution is slightly wrong anyway, but I'll post the correct version.
Omnifarious
A: 

Maybe not exactly readable but it should work:

fks = dict([ (v[1], True) for v in myTupleList ]).keys()
myDict = dict([ (fk, sum([ v[2] for v in myTupleList if v[1] == fk ])) for fk in fks ])

The first line finds all unique foreign keys. The second line builds your dictionary by first constructing a list of (fk, sum(all values for this fk))-pairs and turning that into a dictionary.

Wim
+1  A: 

Look at SQLAlchemy and see if that does all the mapping you need and perhaps more

Mark
A: 

If at all possible, it's usually more efficient to move this into the database, and run an SQL query such as:

SELECT SUM(value) FROM table GROUP BY foreignkey
Wim
+4  A: 
from collections import defaultdict

myDict = defaultdict(int)

for _, key, value in myTupleList:
    myDict[key] += value
gnibbler
There's three values in the tuple list, the first one just isn't used. But the simple fix is to put `_, ` before `key`, so +1.
Chris Lutz
This won't work because `myTupleList` is a three-element tuple. You'll get `ValueError: too many values to unpack`.
Troy J. Farrell
@Chris Lutz, I wouldn't have minded if you had fixed it ... :)
gnibbler
@gnibbler - Some people get a bit uppidy if people edit their answers, but thanks.
Chris Lutz
I think this one is likely a little more efficient than Chris' original version.
Omnifarious
I like this answer a lot too. Very succinct. Very clear on the meaning.
wheaties
+3  A: 

Here's my (tongue in cheek) answer:

myDict = reduce(lambda d, t: (d.__setitem__(t[1], d.get(t[1], 0) + t[2]), d)[1], myTupleList, {})

It is ugly and bad, but here is how it works.

The first argument to reduce (because it isn't clear there) is lambda d, t: (d.__setitem__(t[1], d.get(t[1], 0) + t[2]), d)[1]. I will talk about this later, but for now, I'll just call it joe (no offense to any people named Joe intended). The reduce function basically works like this:

 joe(joe(joe({}, myTupleList[0]), myTupleList[1]), myTupleList[2])

And that's for a three element list. As you can see, it basically uses its first argument to sort of accumulate each result into the final answer. In this case, the final answer is the dictionary you wanted.

Now for joe itself. Here is joe as a def:

def joe(myDict, tupleItem):
   myDict[tupleItem[1]] = myDict.get(tupleItem[1], 0) + tupleItem[2]
   return myDict

Unfortunately, no form of = or return is allowed in a Python lambda so that has to be gotten around. I get around the lack of = by calling the dicts __setitem__ function directly. I get around the lack of return in by creating a tuple with the return value of __setitem__ and the dictionary and then return the tuple element containing the dictionary. I will slowly alter joe so you can see how I accomplished this.

First, remove the =:

def joe(myDict, tupleItem):
   # Using __setitem__ to avoid using '='
   myDict.__setitem__(tupleItem[1], myDict.get(tupleItem[1], 0) + tupleItem[2])
   return myDict

Next, make the entire expression evaluate to the value we want to return:

def joe(myDict, tupleItem):
   return (myDict.__setitem__(tupleItem[1], myDict.get(tupleItem[1], 0) + tupleItem[2]),
           myDict)[1]

I have run across this use-case for reduce and dict many times in my Python programming. In my opinion, dict could use a member function reduceto(keyfunc, reduce_func, iterable, default_val=None). keyfunc would take the current value from the iterable and return the key. reduce_func would take the existing value in the dictionary and the value from the iterable and return the new value for the dictionary. default_val would be what was passed into reduce_func if the dictionary was missing a key. The return value should be the dictionary itself so you could do things like:

myDict = dict().reduceto(lambda t: t[1], lambda o, t: o + t, myTupleList, 0)
Omnifarious
Thanks for posting it. That is ugly and indecipherable but at least it's educational. I'm so fond of list comprehensions that I was hoping there might be some sort of dictionary comprehension I could utilize. I've gone with his answer. It is as concise as you could get.
wheaties
@wheaties, newer versions of Python (though I'm not sure how new they have to be) do have dictionary comprehensions. I'm not positive you could make them work for you here in the manner you would like. I did update my answer to add a nice function that could be added to the `dict` interface that would make situations like this one much easier to handle.
Omnifarious
python 3 has dict comprehensions http://docs.python.org/dev/3.0/whatsnew/3.0.html
Corey Goldberg
@Corey Goldberg, thanks, I thought it might be a Py3k thing. And, as I suspected, they do not help you solve this problem efficiently. Concisely perhaps, but not efficiently because there's no good way to collect the results by dictionary key.
Omnifarious