views:

252

answers:

4

I am trying to write a script that will take a dictionary of items, each containing properties of values from 0 - 10, and add the various elements to select which combination of items achieve the desired totals. I also need the script to do this, using only items that have the same "slot" in common.

For example:

item_list = {
 'item_1': {'slot': 'top', 'prop_a': 2, 'prop_b': 0, 'prop_c': 2, 'prop_d': 1 },
 'item_2': {'slot': 'top', 'prop_a': 5, 'prop_b': 0, 'prop_c': 1, 'prop_d':-1 },
 'item_3': {'slot': 'top', 'prop_a': 2, 'prop_b': 5, 'prop_c': 2, 'prop_d':-2 },
 'item_4': {'slot': 'mid', 'prop_a': 5, 'prop_b': 5, 'prop_c':-5, 'prop_d': 0 },
 'item_5': {'slot': 'mid', 'prop_a':10, 'prop_b': 0, 'prop_c':-5, 'prop_d': 0 },
 'item_6': {'slot': 'mid', 'prop_a':-5, 'prop_b': 2, 'prop_c': 3, 'prop_d': 5 },
 'item_7': {'slot': 'bot', 'prop_a': 1, 'prop_b': 3, 'prop_c':-4, 'prop_d': 4 },
 'item_8': {'slot': 'bot', 'prop_a': 2, 'prop_b': 2, 'prop_c': 0, 'prop_d': 0 },
 'item_9': {'slot': 'bot', 'prop_a': 3, 'prop_b': 1, 'prop_c': 4, 'prop_d':-4 },
}

The script would then need to select which combinations from the "item_list" dict that using 1 item per "slot" that would achieve a desired result when added.

For example, if the desired result was: 'prop_a': 3, 'prop_b': 3, 'prop_c': 8, 'prop_d': 0, the script would select 'item_2', 'item_6', and 'item_9', along with any other combination that worked.

'item_2': {'slot': 'top', 'prop_a': 5, 'prop_b': 0, 'prop_c': 1, 'prop_d':-1 }
'item_6': {'slot': 'mid', 'prop_a':-5, 'prop_b': 2, 'prop_c': 3, 'prop_d': 5 }
'item_9': {'slot': 'bot', 'prop_a': 3, 'prop_b': 1, 'prop_c': 4, 'prop_d':-4 }
 'total':                 'prop_a': 3, 'prop_b': 3, 'prop_c': 8, 'prop_d': 0

Any ideas how to accomplish this? It does not need to be in python, or even a thorough script, but just an explanation on how to do this in theory would be enough for me. I have tried working out looping through every combination, but that seems to very quickly get our of hand and unmanageable. The actual script will need to do this for about 1,000 items using 20 different "slots", each with 8 properties.

Thanks for the help!

+3  A: 

This sounds like a variation of the Knapsack problem, which is commonly solved with dynamic programming.

But, you could probably write a fairly simple solution (but slower) using recursion:

def GetItemsForSlot(item_list, slot):
  return [ (k,v) for (k,v) in item_list.items() if v['slot'] == slot]

def SubtractWeights(current_weights, item_weights):
  remaining_weights = {}
  for (k,v) in current_weights.items():
      remaining_weights[k] = current_weights[k] - item_weights[k]
  return remaining_weights

def AllWeightsAreZero(remaining_weights):
  return not [v for v in remaining_weights.values() if v != 0]

def choose_items(item_list, remaining_weights, available_slots,
                 accumulated_items=[ ]):
    print "choose_items: ", remaining_weights, available_slots, \
      accumulated_items
    # Base case: we have no more available slots.
    if not available_slots:
      if AllWeightsAreZero(remaining_weights):
          # This is a solution.
          print "SOLUTION FOUND: ", accumulated_items
          return
      else:
          # This had remaining weight, not a solution.
          return

    # Pick the next available_slot
    slot = available_slots[0]
    # Iterate over each item for this slot, checking to see if they're in a
    # solution.
    for name, properties in GetItemsForSlot(item_list, slot):
        choose_items(item_list, # pass the items recursively
                     SubtractWeights(remaining_weights, properties),
                     available_slots[1:], # pass remaining slots
                     accumulated_items + [name]) # Add this item




if __name__ == "__main__":
    total_weights = {
       'prop_a': 3,
       'prop_b': 3,
       'prop_c': 8,
       'prop_d': 0
    }

    choose_items(item_list, total_weights, ["top", "mid", "bot"])

This was tested, and seemed to work. No promises though :)

Keeping slot & prop_a as properties of the same object made it a little harder to work with. I'd suggest using classes instead of a dictionary to make the code easier to understand.

Stephen
`if available_slots == []: ...` is usually written as `if not available_slots:` in Python.
J.F. Sebastian
`not any(total_weights.values())` is shorter but less readable alternative to `0 == len([ v for v in total_weights.values() if v != 0])`. btw, Python doesn't allow "assignment" in an `if`-condition therefore ugly construction such as `0 == some_var` is completely useless.
J.F. Sebastian
@JF, Thanks fixed empty check. It's been about 4 years since I've written any substantive python. I do disagree that 0 == some_var is such an ugly construction, but meh.
Stephen
You could write `for name, item in GetItemsForSlot(item_list, slot):` if `GetItemsForSlot()` returns an iterable of 2-element tuples. It is more readable to use `name, item` instead of `item[0], item[1]`.
J.F. Sebastian
@Stephen: I don't like `0 == some_var` even in C (any modern compiler is smart enough to warn you when it is necessary), but I agree it is a matter of taste.
J.F. Sebastian
@JF, agreed, done.
Stephen
+1  A: 

I have tried working out looping through every combination, but that seems to very quickly get our of hand and unmanageable. The actual script will need to do this for about 1,000 items using 20 different "slots", each with 8 properties.

It might help your thinking to load the structure in a nice object hierarchy first and then solve it piecewise.

Example:

class Items(dict):
    def find(self, **clauses):
        # TODO!

class Slots(dict):
    # TODO!

items = Items()
for item, slots in item_list.items():
    items[item] = Slots(slots)
    # consider abstracting out slot based on location (top, mid, bot) too

print items.find(prop_a=3, prop_b=3, prop_c=8, prop_d=0)
bebraw
+7  A: 

Since the properties can have both positive and negative values, and you need all satisfactory combinations, I believe there is no "essential" optimization possible -- that is, no polynomial-time solution (assuming P != NP...;-). All solutions will come down to enumerating all the one-per-slot combinations and checking the final results, with very minor tweaks possible that may save you some percent effort here or there, but nothing really big.

If you have 1000 items in 20 possible slots, say equally distributed at about 50 items per slot, there are around 50**20 possibilities overall, i.e, 9536743164062500000000000000000000 -- about 10**34 (a myriad billions of billions of billions...). You cannot, in general, "prune" any subtree from the "all-solutions search", because no matter the prop values when you have a hypothetical pick for the first 20-p slots, there still might be a pick of the remaining p slots that could satisfy the constraint (or, more than one).

If you could find an exact polynomial-time solution for this, a NP-complete problem, you'd basically have revolutionized modern mathematics and computer science -- Turing prizes and Field medals would only be the start of the consequent accolades. This is not very likely.

To get down to a feasible problem, you'll have to relax your requirements in some ways (accept the possibility of finding just a subset of the solutions, accept a probabilistic rather than deterministic approach, accept approximate solutions, ...).

Once you do, some small optimizations may make sense -- for example, start with summing constants (equal to one more than the smallest negative value of each propriety) to all the property values and targets, so that every prop value is > 0 -- now you can sort the slots by (e.g.) value for some property, or the sum of all properties, and do some pruning based on the knowledge that adding one more slot to a partial hypothetical solution will increase each cumulative prop value by at least X and the total by at least Y (so you can prune that branch if either condition makes the running totals exceed the target). This kind of heuristic approximation need not make the big-O behavior any better, in general, but it may reduce the expected multiplier value by enough to get the problem closer to being computationally feasible.

But it's not even worth looking for such clever little tricks if there's no requirement relaxation possible: in that case, the problem will stay computationally unfeasible, so looking for clever little optimizations would not be practically productive anyway.

Alex Martelli
+1 for properly explaining that this is NP-complete, and infeasible as described... the rest of us just wanted to write some code ;)
Stephen
While I am all for a proper understanding of complexity theory, this answer may not be helpful. A problem being NP-complete doesn't mean we cannot solve it, it only means there is no known algorithm that *exactly* solve the problem and scales polynomially as the input size grows *large*. It is usually possible to solve the problem exactly for quite a range of small input sizes, and whether the input sizes are small enough depends on the structure of the problem. Here, there is a pseudo-polynomial algorithm that runs in time polynomial in the magnitude of the properties, despite your arguments.
ShreevatsaR
@Shreevatsar, the size of the space of possibilities (`50**20`!) _is_ large, although the idea of restricting to the "property-values space", given integer-valued properties, is a good one (but "size" needs to be carefully defined, when negative numbers are in play -- I recommend normalizing as described in my A's next-to-last paragraph, so that normalized property values are naturals, for ease of explanation and reasoning). I _do_ recommend you post your solution anyway: whether it's directly useful to the OP or not, it can't hurt us all to learn about it!-)
Alex Martelli
Ok, posted it. I hadn't noticed the "0-10" in the first line (which evidently should be -10 to 10), which makes the "right" DP solution still infeasible (though "much better"), but I believe that with non-pathological data (i.e. not generated by an adversary :p), a variant ought to work. The important point though, from a theoretical perspective, is that its search space is entirely different from the 50^20, and not every problem with a large search space is NP-complete (and even if it is, it's not always a problem).
ShreevatsaR
+4  A: 
ShreevatsaR