views:

101

answers:

2

Problem description

  • There are different categories which contain an arbitrary amount of elements.
  • There are three different attributes A, B and C. Each element does have an other distribution of these attributes. This distribution is expressed through a positive integer value. For example, element 1 has the attributes A: 42 B: 1337 C: 18. The sum of these attributes is not consistent over the elements. Some elements have more than others.

Now the problem:

We want to choose exactly one element from each category so that

  • We hit a certain threshold on attributes A and B (going over it is also possible, but not necessary)
  • while getting a maximum amount of C.

Example: we want to hit at least 80 A and 150 B in sum over all chosen elements and want as many C as possible.


I've thought about this problem and cannot imagine an efficient solution. The sample sizes are about 15 categories from which each contains up to ~30 elements, so bruteforcing doesn't seem to be very effective since there are potentially 30^15 possibilities.

My model is that I think of it as a tree with depth number of categories. Each depth level represents a category and gives us the choice of choosing an element out of this category. When passing over a node, we add the attributes of the represented element to our sum which we want to optimize.

If we hit the same attribute combination multiple times on the same level, we merge them so that we can stripe away the multiple computation of already computed values. If we reach a level where one path has less value in all three attributes, we don't follow it anymore from there.

However, in the worst case this tree still has ~30^15 nodes in it.

Does anybody of you can think of an algorithm which may aid me to solve this problem? Or could you explain why you think that there doesn't exist an algorithm for this?

+1  A: 

This question is very similar to a variation of the knapsack problem. I would start by looking at solutions for this problem and see how well you can apply it to your stated problem.

Irwin M. Fletcher
The major difference for this is, that I don't have an upper bound for attributes A and B. I can exceed them as much as I want as long as C stays maximal. The dynamic programming approach therefore won't work. Additionaly, I have the restriction "one element per category" which the knapsack problem doesn't have.
Etan
Given the size of the set, a optimal solution is not going to be possible unless you have a lot of time. Therefore, creating a heuristics with an near upper bound for A and B is likely going to provide your near optimal answer, which in turn lends itself to a problem "like" the knapsack problem.
Irwin M. Fletcher
A: 

My first inclination to is try branch-and-bound. You can do it breadth-first or depth-first, and I prefer depth-first because I think it's cleaner.

To express it simply, you have a tree-walk procedure walk that can enumerate all possibilities (maybe it just has a 5-level nested loop). It is augmented with two things:

  • At every step of the way, it keeps track of the cost at that point, where the cost can only increase. (If the cost can also decrease, it becomes more like a minimax game tree search.)

  • The procedure has an argument budget, and it does not search any branches where the cost can exceed the budget.

Then you have an outer loop:

for (budget = 0; budget < ... ; budget++){
    walk(budget);
    // if walk finds a solution within the budget, halt
}

The amount of time it takes is exponential in the budget, so easier cases will take less time. The fact that you are re-doing the search doesn't matter much because each level of the budget takes as much or more time than all the previous levels combined.

Combine this with some sort of heuristic about the order in which you consider branches, and it may give you a workable solution for typical problems you give it.

IF that doesn't work, you can fall back on basic heuristic programming. That is, do some cases by hand, and pay attention to how you did it. Then program it the same way.

I hope that helps.

Mike Dunlavey
I understand this as `cost` is the sum of attributes A and B up to the current node of the graph and that `budget` is a reasonable amount over the threshold we want to reach. However, I don't see where the sum of attribute C is maximized in your approach.
Etan