views:

188

answers:

8

(I apologize that previous versions of this question displayed the wrong function that I need to fix, this has been remedied and I hope the question makes a little more sense now.)

I have a list of objects with scores, and I'm attempting to assign rank to them based on those scores. Below is basically how I output my data.

sorted_scores = [
    ('Apolo Ohno', 0),
    ('Shanie Davis', -1),
    ('Bodie Miller', -2),
    ('Lindsay Vohn', -3),  
    ('Shawn White', -3),
    ('Bryan Veloso', -4)
]

I have a tie. Now, the function that assigns positions to the objects above right now is a simple for loop that just assigns the value of i as the object's final position.

positions = {}

i = 1
for key, value in sorted_list:
    # Since in my codebase the strings are IDs, I use the key to fetch the object.
    if value is not None:
        positions[key] = i
        i += 1

So that'll obviously return:

positions = {
    'Apolo Ohno': 1,
    'Shanie Davis': 2,
    'Bodie Miller': 3,
    'Lindsay Vohn': 4,        
    'Shawn White': 5,
    'Bryan Veloso': 6
}

Hopefully that makes some sense. The meat of the question is that loop. What makes more sense is if it returns them like so:

positions = {
    'Apolo Ohno': 1,
    'Shanie Davis': 2,
    'Bodie Miller': 3,
    'Lindsay Vohn': 4, # Same value.
    'Shawn White': 4, # Same value.
    'Bryan Veloso': 6
}

How would I edit the function above to do that, keeping in mind that I could have any number of ties at any given time depending on how many of my members ranked said object? The highest rank should be 1, so it can be displayed as such: <rank>/<total # of people>

Thanks in advance. :)

+2  A: 

The way to do this is not to calculate the element's position is some arbitrary sequence, but rather to calculate how many other elements have a better score.

EDIT:

By popular demand, O(n)'ed and everything:

positions = {}
cur_score = None # Score we're examining
cur_count = 0 # Number of others that we've seen with this score

for ix, (name, score) in enumerate(sorted_scores):
  if score == cur_score: # Same score for this player as previous
    cur_count += 1
  else: # Different score from before
    cur_score = score
    cur_count = 0
  positions[name] = ix - cur_count + 1 # Add 1 because ix is 0-based

print positions
Ignacio Vazquez-Abrams
I don't know why this was voted down, as this suggestion is extremely reasonable and correct. It's an invariant that every rank value in a list of "ranks with ties" is exactly the number of better scores (+1, usually). Depending on the strategy for solving the problem, Ignacio's suggestion has extra value as it is perfectly parallelizable and--as such--wonderfully suited to multi-threaded/multi-process solutions.
Travis Bradshaw
Well it wasn't me ... but the reasons may well have included "no code".
John Machin
A: 

I'm having to make a bunch of assumptions about what it is you want to do, but here's an attempt:

scores = {
    'lorem': 100,
    'ipsum': 200,
    'dolor': 300,
    'sit': 300,
    'amet': 300,
    'quia': 400,
    'consectetur': 500,
    'adipiscing': 500,
    'elit': 600,
    }

groups = {}
for (member, score) in scores.items():
    if score not in groups:
        groups[score] = [member]
    else:
        groups[score].append(member)

positions = {}
for (rank, (score, members)) in enumerate(groups.items()):
    for member in members:
        positions[member] = rank

Showing my working:

>>> import pprint
>>> scores = {
...     'lorem': 100,
...     'ipsum': 200,
...     'dolor': 300,
...     'sit': 300,
...     'amet': 300,
...     'quia': 400,
...     'consectetur': 500,
...     'adipiscing': 500,
...     'elit': 600,
...     }
>>> groups = {}
>>> for (member, score) in scores.items():
...     if score not in groups:
...         groups[score] = [member]
...     else:
...         groups[score].append(member)
...
>>> pprint.pprint(groups)
{100: ['lorem'],
 200: ['ipsum'],
 300: ['sit', 'dolor', 'amet'],
 400: ['quia'],
 500: ['consectetur', 'adipiscing'],
 600: ['elit']}
>>> positions = {}
>>> for (rank, (score, members)) in enumerate(groups.items()):
...     for member in members:
...         positions[member] = rank
...
>>> pprint.pprint(positions)
{'adipiscing': 4,
 'amet': 2,
 'consectetur': 4,
 'dolor': 2,
 'elit': 5,
 'ipsum': 1,
 'lorem': 0,
 'quia': 3,
 'sit': 2}
>>> pprint.pprint(sorted(positions.items(), key=lambda i: i[1]))
[('lorem', 0),
 ('ipsum', 1),
 ('sit', 2),
 ('dolor', 2),
 ('amet', 2),
 ('quia', 3),
 ('consectetur', 4),
 ('adipiscing', 4),
 ('elit', 5)]
bignose
You import `itertools` but never use it. Try putting stand-alone code in a file and running it from the command line.
John Machin
Much worse: It doesn't work. Your single correct result is an accident. Change the score of elit from 600 to 6000 (still running last on the low-score-is-better regime) and see what happens. You are assuming incorrectly that groups.items() is sorted on score.
John Machin
+2  A: 

=== Update after change/clarification of specs ===

# coding: ascii

def ranks_from_scores(sorted_scores):
    """sorted_scores: a list of tuples (object_id, score), sorted by score DESCENDING
       return a mapping of object IDs to ranks
    """
    ranks = {}
    previous_score = object()
    for index, (obj_id, score) in enumerate(sorted_scores):
        if score != previous_score:
            previous_score = score
            rank = index + 1
        ranks[obj_id] = rank
    return ranks

from operator import itemgetter
import pprint

scores0 = dict([
    ('Apolo Ohno', 0),
    ('Shanie Davis', -1),
    ('Bodie Miller', -2),
    ('Lindsay Vohn', -3),
    ('Shawn White', -3)
    ])

scores1 = {
    'lorem': 100,
    'ipsum': 200,
    'dolor': 300,
    'sit': 300,
    'amet': 300,
    'quia': 400,
    'consectetur': 500,
    'adipiscing': 500,
    'elit': 600,
    }

scores2 = {
    'lorem': 100,
    'ipsum': 200,
    'dolor': 300,
    'sit': 300,
    'amet': 300,
    'quia': 400,
    'consectetur': 500,
    'adipiscing': 500,
    'elit': 6000,
    }

import pprint
funcs = (ranks_from_scores, ) # Watch this space!
tests = (scores0, scores1, scores2)

for test in tests:
    print
    test_list = sorted(test.items(), key=itemgetter(1), reverse=True)
    print "Input:", test_list
    for func in funcs:
        result = func(test_list)
        print "%s ->" % func.__name__
        pprint.pprint(result)

Results:

Input: [('Apolo Ohno', 0), ('Shanie Davis', -1), ('Bodie Miller', -2), ('Lindsay
 Vohn', -3), ('Shawn White', -3)]
ranks_from_scores ->
{'Apolo Ohno': 1,
 'Bodie Miller': 3,
 'Lindsay Vohn': 4,
 'Shanie Davis': 2,
 'Shawn White': 4}

Input: [('elit', 600), ('consectetur', 500), ('adipiscing', 500), ('quia', 400),
 ('dolor', 300), ('sit', 300), ('amet', 300), ('ipsum', 200), ('lorem', 100)]
ranks_from_scores ->
{'adipiscing': 2,
 'amet': 5,
 'consectetur': 2,
 'dolor': 5,
 'elit': 1,
 'ipsum': 8,
 'lorem': 9,
 'quia': 4,
 'sit': 5}

Input: [('elit', 6000), ('consectetur', 500), ('adipiscing', 500), ('quia', 400)
, ('dolor', 300), ('sit', 300), ('amet', 300), ('ipsum', 200), ('lorem', 100)]
ranks_from_scores ->
{'adipiscing': 2,
 'amet': 5,
 'consectetur': 2,
 'dolor': 5,
 'elit': 1,
 'ipsum': 8,
 'lorem': 9,
 'quia': 4,
 'sit': 5}

=== original submission ===

This code assumes that you really want the highest score to get rank 1, not the lowest score getting rank 1 (or 0!).

# coding: ascii

def ranks_from_scores(scores, debug=False):
    """scores (a mapping of object IDs to scores)
       return a mapping of object IDs to ranks
    """
    alist = [(v, k) for k, v in scores.items()]
    alist.sort(reverse=True)
    if debug: print 'alist:', alist
    bdict = {}
    previous_score = object()
    for posn, (score, obj_id) in enumerate(alist):
        if score != previous_score:
            previous_score = score
            rank = posn + 1
        bdict[obj_id] = rank
    if debug:
        print 'bdict:', bdict
        blist = [(v, k) for k, v in bdict.items()]
        print 'blist:', sorted(blist)
    return bdict

ranks_from_scores(
    {'q': 10, 'w': 20, 'e': 20, 'r': 20, 't': 30},
    debug=True,
    )

Output:

alist: [(30, 't'), (20, 'w'), (20, 'r'), (20, 'e'), (10, 'q')]
bdict: {'q': 5, 'r': 2, 'e': 2, 't': 1, 'w': 2}
blist: [(1, 't'), (2, 'e'), (2, 'r'), (2, 'w'), (5, 'q')]
John Machin
+1  A: 

Looks like you can use the sorted and enumerate builtins, the groupby method from itertools and the itemgetter method from operator. Assumes higher scores are better... (if lower scores are better, change reverse=True to reverse=False)

>>> from itertools import groupby
>>> from operator import itemgetter
>>> scores = {
...     'lorem': 100,
...     'ipsum': 200,
...     'dolor': 300,
...     'sit': 300,
...     'amet': 300,
...     'quia': 400,
...     'consectetur': 500,
...     'adipiscing': 500,
...     'elit': 600,
...     }
>>> sorted_items = sorted(scores.items(), key=itemgetter(1), reverse=True)
>>> groups = groupby(sorted_items, itemgetter(1))
>>> for rank, (score, items) in enumerate(groups):
...     print rank+1, map(itemgetter(0), items)
... 
1 ['elit']
2 ['consectetur', 'adipiscing']
3 ['quia']
4 ['dolor', 'sit', 'amet']
5 ['ipsum']
6 ['lorem']
jgeewax
Doesn't appear to do what the OP wants: note his example of "more sensible" output `positions = {'1': 1, '2': 2, '3': 2, '4': 2, '5': 5, '6': 6}` i.e. one gets first prize, three share second prize, the fifth is rank FIVE; you would have the fifth as rank THREE.
John Machin
A: 

Here is a simple way to do it

last = None
position = 0
delta = 1
for key, value in sorted_list:
    if value is not None:
        if value != last:
            position += delta
            delta = 1
        else:
            delta += 1
        # i believe this is supposed to be [key] not [value] in OP's code
        positions[key] = position
        last = value
vishvananda
If there is a two way tie for 4th place, I think the next place should be 6th
gnibbler
@gnibbler: What you say has a serendipitous combination of two attributes (a) it's sensible (b) it's what the OP expressly wants.
John Machin
@gnibbler: Good point. I fixed the code according to your suggestion
vishvananda
A: 
>>> sorted_scores = [
...     ('Apolo Ohno', 0),
...     ('Shanie Davis', -1),
...     ('Bodie Miller', -2),
...     ('Lindsay Vohn', -3),  
...     ('Shawn White', -3),
...     ('Bryan Veloso',-4)
... ]
>>> 
>>> from itertools import groupby
>>> from operator import itemgetter
>>> 
>>> place=1
>>> res={}
>>> for _,items in groupby(sorted_scores,key=itemgetter(1)):
...     for i,item in enumerate(items):
...         res[item[0]]= place
...     place+=i+1
... 
>>> print res
{'Apolo Ohno': 1, 'Bryan Veloso': 6, 'Shanie Davis': 2, 'Lindsay Vohn': 4, 'Bodie Miller': 3, 'Shawn White': 4}

Remember that dicts are unordered, so to iterate in order of place, you need to do this

>>> print sorted(res.items(),key=itemgetter(1))
[('Apolo Ohno', 1), ('Shanie Davis', 2), ('Bodie Miller', 3), ('Lindsay Vohn', 4), ('Shawn White', 4), ('Bryan Veloso', 6)]
gnibbler
+1  A: 

Solution

Here's one simple way to do it by modifying your code a little rather than importing modules:

prev = None
rank = 0
incr = 1
positions = {}
for key, value in sorted_list:
    if value is not None:
        if value != prev:
            rank += incr
            incr = 1
        else:
            incr += 1
        positions[key] = rank
        prev = value

A Test

For

sorted_list = [
    ('Apolo Ohno', 0),
    ('Shanie Davis', -1),
    ('Bodie Miller', -2),
    ('Lindsay Vohn', -3),  
    ('Shawn White', -3),
    ('Bryan Veloso',-4)
]

I get positions as:

{'Apolo Ohno': 1, 
'Shanie Davis': 2,
 'Bodie Miller': 3,
 'Lindsay Vohn': 4,
 'Shawn White': 4,
 'Bryan Veloso': 6}

which I think is what you are going for even though you aren't quite clear about whether there should be a 6 after the two 4's.

Justin Peel
Indeed, there should be a 6. I edited my question to reflect that.
Bryan Veloso
This appears to be an exact duplicate of my answer. I guess great minds think alike?
vishvananda
@vishvananda: It seems to be endemic :-(
John Machin
@vishvananda Yours was incorrect when I posted mine, but then you edited yours.
Justin Peel
@Justin: I see. I guess I missed it. :)
vishvananda
+3  A: 
>>> sorted_scores = [
...     ('Apolo Ohno', 0),
...     ('Shanie Davis', -1),
...     ('Bodie Miller', -2),
...     ('Lindsay Vohn', -3),  
...     ('Shawn White', -3),
...     ('Bryan Veloso',-4)
... ]
>>> 
>>> res = {}
>>> prev = None
>>> for i,(k,v) in enumerate(sorted_scores):
...     if v!=prev:
...         place,prev = i+1,v
...     res[k] = place
... 
>>> print res
{'Apolo Ohno': 1, 'Bryan Veloso': 6, 'Shanie Davis': 2, 'Lindsay Vohn': 4, 'Bodie Miller': 3, 'Shawn White': 4}

Remember that dicts are unordered, so to iterate in order of place, you need to do this

>>> from operator import itemgetter
>>> print sorted(res.items(),key=itemgetter(1))
[('Apolo Ohno', 1), ('Shanie Davis', 2), ('Bodie Miller', 3), ('Lindsay Vohn', 4), ('Shawn White', 4), ('Bryan Veloso', 6)]
gnibbler
much shorter than my solution. Good call using enumerate.
vishvananda
@vishvananda: Please also do a critique on my (earlier) answer :-)
John Machin
@John: Sorry john, I missed the fact that your answer uses the same technique.
vishvananda