views:

94

answers:

5

I have a set of 18 values (it will always be 18) which I need to distribute into two sets, one of 10 items, and one of 8 items.

The rule for distribution is that the values of each set must be equal (or as close as possible) to a particular known value - so in the first set the sum of the values must be as close as possible to 1500000 and in the second set the sum iof the values must be as close as possible to 1000000.

What is the best (and that may mean simplest) algorithm to do this?

Further clarification, the values all range between 110000 and 200000. The values are always multiples of a 100 and are all positive integers, and there can be duplicates.

A: 

Looks like an optimization problem to me. Randomly separate the values into the two sets, and then start swapping values (use a good heuristic), and accept the change if the result is better.

tathagata
Will often get a "good enough" answer quickly, but you can easily run into local minima and never find the optimum doing that!
mikera
+1  A: 

Your problem, as stated, is np unless there is a pattern to the data.

The only way to achieve the best answer is find all permutations of 18 into 10 and 8 and associated sums. Weight according to your preference.

Adam Shiemke
@Adam - can you clarify what you mean by pattern to the data? All the values will range between 110000 and 200000, does this qualify as a pattern?
BeeBand
+6  A: 

There are only 43758 such selections. Go through each of them and find the best.

Pete Kirkham
Agreed. Although the problem is intractable, OP is saved by the fact that there's a limit of 18.
Lucas Oman
+4  A: 

It is an optimization problem. Here you have two optimization criteria, which should be combine to single one. For example like this:

F(A, B) = w1*abs(sum(A) - 1500000) + w2*abs(sum(B) - 1000000)

where A and B your sets, sum() is a sum of elements in a set, and w1 and w2 is weights.

Then you should find a strategy for iteration over possible combinations. The simpliest strategy is to find all 10-combinations of 18, and select that one which minimize F(A,B). There are C(18,10) = 43758 combinations.

Peter
+3  A: 

While brute force is probably best for this problem size, there are other tricks you can play if you're willing to get an approximate solution or if the brute force method is still too expensive. the basic idea is to snap the values to a small grid, and then do brute force on the (much smaller) set of entries in the grid.

in your case, (pretending I've already divided by 100), all numbers are between 1100 and 2000, so you can "snap" them to the 10 integers 1100, 1200 and so on. The maximum error in doing is at most 50/1100 which is less than 5%. Now you've halved the input size, which makes the brute force run a bit faster.

Again, I wouldn't recommend this unless (a) brute force is really slow right now or (a) the problem size increases beyond 18.

p.s the problem is called SUBSET SUM (or sometimes KNAPSACK depending on the formulation) and is NP-complete. Here's a reference for the approximation idea.

Suresh
fyi that link goes to a PDF, here are some Wikipedia links too: http://en.wikipedia.org/wiki/Subset_sum_problem http://en.wikipedia.org/wiki/Knapsack_problem
Kip
You are the geomblog guy? Good to see you here :-)
Moron
indeed. that's me :)
Suresh
@Suresh, thanks this is useful. I discovered that I could generate all 43578 combinations of 10, 8 in less than a second using python. So I guess weighting them will take seconds also. I doubt I will need to approximate, but i can see the benefit of 'snapping' like this.
BeeBand