views:

518

answers:

3

I'm in search of an algorithm, which can handle the problem described below. I already have written an algorithm (which is too specialised to post, I think), optimised as much as I could think of, but on larger sets of numbers it still is too slow (as the costs rise exponentially). The solution should take no longer than 5s on a decent computer.

You're given a set of numbers, e.g.:

M = { 1, 1, 1, 2, 2, 2, 5, 5, 5, 10, 10, 10, 10, 20, 50, 50, 50, ..., 10000, 10000, 20000, 20000 }

The do not have to have a special structure (although they have here).

You're given a set of "target points", also numbers, e.g.:

P = { 670, 2010, 5600, 10510, 15000}

The goal is, to take the least quantity of numbers out of M, where, when you add them in the decided order, you get intermediate results as close as possible to all of the points in P. You can only use each number in M once.

In our example, a possible solution would be (although I don't know if it's the best):

Y = ( 500, 100, 50; 1000, 200, 200; 2000, 1000, 500; 5000; 2000, 2000)

As you can see, the two criteria least and close for some sort of tradeoff. That's why my current algorithm uses scoring to find the "best" solution.

Here's how it currently works:

  1. sort M, sort P, ascending
  2. remove numbers too small to relevantly change score or numbers which are simply too large
  3. recursively:
  4. take the next point in P as the current "target", plus minus e.g. 10%
  5. add next number out of M, remove it out if M
  6. when near the target point, goto 4. If at the end point, calculate score of current distribution and possibly remember it
  7. else goto 5
  8. when coming back from trying number, take next higher number

It never tries two same numbers and only tries the ascending order, e.g.:

  • 100, 100, 100, 50, 50, 20, 10
  • 100, 100, 100, 50, 50, 20, 20
  • 100, 100, 100, 50, 50, 50, 10
  • 100, 100, 100, 50, 50, 50, 20
  • 100, 100, 100, 50, 50, 50, 50
  • 100, 100, 100, 100
  • 100, 100, 100, 100, 10
  • 100, 100, 100, 100, 20
  • ...

With about 5 of each number, and removing many of the smaller numbers, the algorithm is really fast and finds a good solution. But as I add more numbers or especially include smaller numbers, the runtime rises from 100ms to infinity.

Can you give me a hint, how to deal with this problem? Are there any similar algorithms in literature which can handle the problem or a part of it?

A: 

This is somewhat similar to the Knapsack Problem.

caf
It is composed of multiple knapsack problems, which interact:"is it better to take one number from interval A and put it to interval B?"I was hoping that some additional constraints could reduce the complexity.
Wikser
A: 

Lemme try, it may not be the best but should work nicely. I'm using PHP here -

1) Categorize the values and their counts in M: Lets call it C

$C = array_count_values( $M );

This gives us:
Array
(
    [1] => 3
    [2] => 3
     ...
    [20] => 1
     ...
)

2) Pick up the first number from P and apply binary search on M and get the number N1 nearest to P1 (N1 < P1). Subtract the respective count from C

    So say you get 500 which is nearest to 670. Now subtract $C[500] - 1.
You can validate if count is not 0 and if zero get the next lower number from M.

3) Get P1-N1 and again binary search for this number and return a nearest value. Add this to N1 and keep looping till you get the closest sum.

4) Repeat point 2 and point 3 for all members of P.

Binary search is the key part here, should be efficient enough.

Arpit Tambi
A: 

Similar to a coin change problem: http://www.personal.kent.edu/~rmuhamma/Algorithms/MyAlgorithms/Greedy/greedyIntro.htm

The only differences are that you have a limited supply of "coins" (which can be easily solved by marking items in the array as "used") and that you don't need to reach the exact number - plus/minus 10% is good for you (so that you can throw away elements in M that are less than 10% of the target value)

DmitryK
Thanks for the hint. The "limited" number of coins is a main part of the problem, as I can't think of a way to use dynamic programming (memoization) on the possible "changes", as they depend on the coins left (which makes far too many combinations to remember and to compare). :/
Wikser