views:

225

answers:

3

Hello

I'm having some trouble developing an algorithm to determine the minimum of a list of n elements. It's not the case of finding the minimum of an array of length n, that's simple:

min = A[0]
for i in range(1, len(A)):
    if min > A[i]: min = A[i]
print min

But my list contains objects:

class Object:
    def __init__(self, somelist):
        self.classification = somelist[0] # String
        self.type           = somelist[1] # String
        self.first          = somelist[2] # Integer
        self.last           = somelist[3] # Integer

And for the same 'classification | type' objects I have m elements and I want to find the minimum element of the same 'classification | type' by comparing the difference between first and last.

Example:

obj1 = Object(['A', 'x', 4, 17])
obj2 = Object(['A', 'y', 5, 20])
obj3 = Object(['B', 'z', 10, 27])
obj4 = Object(['B', 'z', 2, 15])
obj5 = Object(['B', 'z', 20, 40])
obj6 = Object(['A', 'x', 6, 10])
obj7 = Object(['A', 'x', 2, 9])
list = [obj1, obj2, obj3, obj4, obj5, obj6, obj7]

So I need an algorithm to determine the minimums of the list:

A | x --> Object(['A', 'x', 6, 10])

B | z --> Object(['B', 'z', 2, 15])

A | y --> Object(['A', 'y', 5, 20])

Thanks!

+7  A: 
filtered = [obj for obj in lst if obj.classification == 'A' and obj.type = 'x']
min(filtered, key=lambda x: x.last - x.first)

Note: don't name your variable list: it shadows built-in.

SilentGhost
In practice I'd consider making `filtered` a generator.
David Zaslavsky
That looks good but I have 1000 classifications and 400 types. So I need a more 'dynamic' solution. Thanks!
ccarpenterg
@ccarpenterg: just put it in a function and make `'A'` and `'x'` variables.
Chris B.
@ChrisB: He's just said he has K = 400,000 key combinations; so you want him to call what's already an O(N) function 400,000 times where N could be about a million?
John Machin
@John: I don't think he needs them all at once. If performance is really a concern, you could always refactor the list into a container that tracks the minimum value, or keeps the values sorted, or some other trickery.
Chris B.
@ChrisB: He wants to "determine the minimums of the list" which means at the very least to me that he wants them all but one at a time which is HUGE using your approach. "refactor the list into a container that tracks the minimum value" ... oh, you mean something like the dictionary used in my answer? This is "trickery"?
John Machin
@John: I was talking about subclassing a collection, and when an item got added to it adjusting the precomputed minimum. That won't work if items are removed, of course, but it could fall back on recomputing everything if it needed to.
Chris B.
+1  A: 
import itertools 
group_func = lambda o: (o.classification, o.type)
map(lambda pair: (pair[0], min(pair[1], key=lambda o: o.last - o.first)),
    itertools.groupby(sorted(l, key=group_func), group_func))

group_func returns a tuple key containing the object's classification, then type (e.g. ('A', 'x')). This is first used to sort the list l (sorted call). We then call groupby on the sorted list, using group_func to group into sublists. Every time the key changes, we have a new sublist. Unlike SQL, groupby requires the list be pre-sorted on the same key. map takes the output of the groupby function. For each group, map returns a tuple. The first element is pair[0], which is the key ('A', 'x'). The second is the minimum of the group (pair[1]), as determined by the last - first key.

Matthew Flaschen
Could you describe what's hapenning in that algorithm?
ccarpenterg
@ccarpenterg, I've added an explanation. Let me know if you have questions.
Matthew Flaschen
+2  A: 

Here's a simple understandable dynamic procedural way of going about it:

class Object:
    def __init__(self, somelist):
        self.classification = somelist[0] # String
        self.type           = somelist[1] # String
        self.first          = somelist[2] # Integer
        self.last           = somelist[3] # Integer
    def weight(self):
        return self.last - self.first
    def __str__(self):
        return "Object(%r, %r, %r, %r)" % (self.classification, self.type, self.first, self.last)
    __repr__ = __str__

obj1 = Object(['A', 'x', 4, 17])
obj2 = Object(['A', 'y', 5, 20])
obj3 = Object(['B', 'z', 10, 27])
obj4 = Object(['B', 'z', 2, 15])
obj5 = Object(['B', 'z', 20, 40])
obj6 = Object(['A', 'x', 6, 10])
obj7 = Object(['A', 'x', 2, 9])
olist = [obj1, obj2, obj3, obj4, obj5, obj6, obj7]

mindict = {}
for o in olist:
    key = (o.classification, o.type)
    if key in mindict:
        if o.weight() >= mindict[key].weight():
            continue
    mindict[key] = o
from pprint import pprint
pprint(mindict)

and here's the output:

{('A', 'x'): Object('A', 'x', 6, 10),
 ('A', 'y'): Object('A', 'y', 5, 20),
 ('B', 'z'): Object('B', 'z', 2, 15)}

Note: the __str__, __repr__ and pprint stuff is just to get the fancy printout, it's not essential. Also the above code runs unchanged on Python 2.2 to 2.7.

Running time: O(N) where N is the number of objects in the list. Solutions which sort the objects are O(N * log(N)) on average. Another solution appears to be O(K * N) where K <= N is the number of unique (classification, type) keys derived from the objects.

Extra memory used: Only O(K). Other solutions appear to be O(N).

John Machin