views:

156

answers:

8

Given A set of numbers n[1], n[2], n[3], .... n[x] And a number M

I would like to find the best combination of

n[a] + n[b] + n[c] + ... + n[?] >= M

The combination should reach the minimum required to reach or go beyond M with no other combination giving a better result.

Will be doing this in PHP so usage of PHP libraries is ok. If not, just a general algorithm will do. Thanks!

+1  A: 

what do you mean by "best"? Least number of n[i]? smallest standard deviation in the n[i]? Impossible to say unless you define "best"?

Aidan
example: 1, 17, 5, 11, 6, 16, 4 and M = 14the answer should be 11, 4 to arrive at 15
Aves Liaw
and what is M? And what is the "best" way to chose the values from this list? Is it the least number, i.e. the smallest subset that adds up to M?
Aidan
Yes, its the set of numbers where the sum needs to reach or go beyond MIf two sets of numbers achieve the same result, then choosing the set with the least quantity of numbers
Aves Liaw
this is like a 0-1 knapsack problem except that if you have two solutions with the same number of elements, the one that adds up to M is better than the one that adds up to more than M, so inyour example {11,4} is better than {11,5}. I think your best bet is sorting the elements into a tree, where you can go down the branches to get all the candidtates with the minimum number of elements that add up to M, and then choose the best candidate. Alternatively, you can investigate the knapsack problem : http://en.wikipedia.org/wiki/Knapsack_problem
Aidan
The problem with using the tree is which number to put at the top of the tree... any suggestions? Or do I have to run through all permutations of the tree with each value on top?
Aves Liaw
+1  A: 

I think that a greedy algorithm approach will work. How many numbers do you expect to have in the set? If it is reasonably low you could try backtrack, but I would not recommend it for large sets.

Chris
Thanks. I will have a look. Currently, I'm looking at a set with maximum of 136 values (with possibility of growth in the future)
Aves Liaw
You should go for backtrack, but trying to improve the execution time using some tricks like first sorting the set and then getting in backtrack. Let me know if you need help
Chris
Currently, backtrack looks like its the best way to go... I need to study it a bit more to see if it'll work. Thanks
Aves Liaw
It will work just fine, giving you the right results but there is a problem with backtrack: is slooow.
Chris
A: 

I think this is NP-complete (it's not possible to find a fast way of doing it in general). The way you've phrased the question makes me think of solving successive Subset sum problems with a target integer that steadily increases from M.

In your example, there is no known method of determining whether M can be reached. Only a brute-force solution would tell you that it's not possible, and only then could you check M+1.

However, you may like to look at the DP solution, as this will at least give you an idea of how to solve the problem (albeit slowly). There is also an approximate solution which will be correct if your numbers are small. Use this. Finally, it's worth pointing out that the size of the problem is measured in the number of bits, not the size (or sum) of the set.

As mentioned above, this is related to the Knapsack problem.

Dijkstra
+1  A: 

This kind of problems is called binary linear programming (a special case of integer linear programming). It is famously NP-hard – i.e. not efficient to solve in general.

However, there exist good solvers, both commercial and free use, e.g. the Open Source solver lpsolve which you can call from your program.

/EDIT: Old answer was bunk. I confused the factors.

Konrad Rudolph
I don't think this is a linear programming problem, as you can have 0 or 1 of each number, with a linear programming problem you can have fractional multiples
Nick Fortescue
@Nick: damn, you’re right. It’s a BLP.
Konrad Rudolph
A: 

A Greedy solution would work A pseudo:

algo(input_set , target_sum, output_set )
   if input_set is NULL
       error "target_sum can never be reached with all input_set"
       return
   the_max = maximum(input_set)
   remove the_max from input_set
   if the_max > target_sum
       algo(input_set, target_sum, ouptput_set )
       return
   add the_max to output_set
   algo(input_set, target_sum - the_max, output_set )

call with output_set = NULL. sort intput_set for efficiency.

Vardhan Varma
Not the solution for op's problem.
Grozz
I think this solution only works if I want the values to sum up to exactly target_sum. However, for my case, the total is allowed beyond target_sum.
Aves Liaw
A: 

This problem is almost exactly the subset sum problem, which is related to the Knapsack problem but different, and is NP complete. However, there is a linear solution if all the numbers are smaller than some constant, and a Polynomial time approximation. See the wikipedia page referenced above.

Nick Fortescue
A: 
pseudocode:

list<set> results;
int m;
list<int> a;

// ...    

a.sort();

for each i in [0..a.size]
    f(i, empty_set);    

// ...

void f(int ind, set current_set)
{
    current_set.add(a[ind]);

    if (current_set.sum > m)
    {
        results.add(current_set);
    }
    else
    {
        for (int i=ind + 1; i<a.size; ++i)
        {
            f(i, current_set);  // pass set by value

            // if previous call reached solution, no need to continue
            if (a[ind] + current_set.sum) > m
                break;
        }
    }
}

// choose whatever "best" result you need from results, the one
// with the lowest sum or lowest number of elements
Grozz
This could work unfortunately does not return the optimal results. For eg M = 21, numbers are 22, 15, 5, 3, 2 ,2, 1. If I'm not wrong, the code would return 2 results [22] and [15, 5, 3, 2, 2]. But the best answer would be [15,5,3,2,1] which would not be considered? Correct me if I'm wrong here.
Aves Liaw
added a minor fix, but for another problem. that solution should work, it kinda goes over all possible solutions cutting out wrong branches asap. so for your inputs it will return [1, 2, 2, 3, 5, 15] [1, 2, 3, 5, 15] [1, 3, 5, 15] [1, 5, 15] [2, 2, 3, 5, 15] etc. Choose the set with the lowest sum or lowest number of elements or whatever your "best" criteria is.
Grozz
Managed to translate this to code. Worked well enough. Thanks
Aves Liaw
+2  A: 

This looks like a classic Dynamic Programming problem (also indicated by other answers mentioning its similarity to 0-1 Knapsack and Subset Sum problems). The whole thing boils down to to one simple choice: for each element in the list, do we use it in our sum or not. We can write up a simple recursive function to compute the answer:

f(index,target_sum)=
     0     if target_sum<=0 (i.e. we don't need to add anymore)
     infinity   if target_sum>0 and index is past the length of n (i.e. we have run out of numbers to add)
     min( f(index+1,target_sum), f(index+1,target_sum-n[index])+n[index] )    otherwise (i.e. we explore two choices -  1. take the current number 2. skip over the current number and take their minimum)

Since this function has overlapping subproblems (it explores the same sub-problems over and over again) , it is a good idea to memoize the function with a cache to hold values that were already computed before.

Here's the code in Python:

#! /usr/bin/env python

INF=10**9 # a large enough number of your choice

def min_sum(numbers,index,M, cache):
    if M<=0: # we have reached or gone past our target, no need to add any more
        return 0
    elif len(numbers)==index: # we have run out of numbers, solution not possible
        return INF
    elif (index,M) in cache: # have been here before, just return the value we found earlier
        return cache[(index,M)]
    else:
        answer=min(
            min_sum(numbers,index+1,M,cache), # skip over this value
            min_sum(numbers,index+1,M-numbers[index],cache)+numbers[index] # use this value
        )
        cache[(index,M)]=answer # store the answer so we can reuse it if needed 
        return answer 

if __name__=='__main__':
    data=[10,6,3,100]
    M=11

    print min_sum(data,0,M,{})

This solution only returns the minimum sum, not the actual elements used to make it. You can easily extend the idea to add that to your solution.

MAK