tags:

views:

230

answers:

6

I need an algorithm that identifies all possible combinations of a set of numbers that sum to some other number.

For example, given the set {2,3,4,7}, I need to know all possible subsets that sum to x. If x == 12, the answer is {2,3,7}; if x ==7 the answer is {{3,4},{7}} (ie, two possible answers); and if x==8 there is no answer. Note that, as these example imply, numbers in the set cannot be reused.

This question was asked on this site a couple years ago but the answer is in C# and I need to do it in Perl and don't know enough to translate the answer.

I know that this problem is hard (see other post for discussion), but I just need a brute-force solution because I am dealing with fairly small sets.

+1  A: 

You can make use of the Data::PowerSet module which generates all subsets of a list of elements:

codaddict
Thanks, that's helpful, but it's a bit too brute force. My sets are small-ish, but not so small that the powerset is generally going to be small. I think I need to do some sort of recursion that stops when all possible future sums are going to be too big.
itzy
A: 

The rough algorithm is as follows:

have a "solve" function that takes in a list of numbers already included and a list of those not yet included.

  • This function will loop through all the numbers not yet included.
  • If adding that number in hits the goal then record that set of numbers and move on,
  • if it is less than the target recursively call the function with the included/exluded lists modified with the number you are looking at.
  • else just go to the next step in the loop (since if you are over there is no point trying to add more numbers unless you allow negative ones)
  • You call this function initially with your included list empty and your yet to be included list with your full list of numbers.

There are optimisations you can do with this such as passing the sum around rather than recalculating each time. Also if you sort your list initially you can do optimisations based on the fact that if adding number k in the list makes you go over target then adding k+1 will also send you over target.

Hopefully that will give you a good enough start. My perl is unfortuantely quite rusty.

Pretty much though this is a brute force algorithm with a few shortcuts in it so its never going to be that efficient.

Chris
A: 

Is this a 'do my homework for me' question?

To do this deterministically would need an algorithm of order N! (i.e. (N-0) * (N-1) * (N-2)...) which is going to be very slow with large sets of inputs. But the algorithm is very simple: work out each possible sequence of the inputs in the set and try adding up the inputs in the sequence. If at any point the sum matches, you've got one of the answers, save the result and move on to the next sequence. If at any point the sum is greater than the target, abandon the current sequence and move on to the next.

You could optimize this a little by deleting any of the inputs greater than the target. Another approach for optimization would be to to take the first input I in the sequence and create a new sequence S1, deduct I from the target T to get a new target T1, then check if T exists in S1, if it does then you've got a match, otherwise repeat the process with S1 and T1. The order is still N! though.

If you needed to do this with a very large set of numbers then I'd suggest reading up on genetic algorithms.

C.

symcbean
Not a do-my-homework problem. An actual real-world problem. I do research in finance, and am not much of a programmer. But I suppose talk is cheap, so I could well be lying.
itzy
A: 

Someone posted a similar question a while ago and another person showed a neat shell trick to answer it. Here is a shell technique, but I don't think it is as neat a solution as the one I saw before (so I'm not taking credit for this approach). It's cute because it takes advantage of shell expansion:

for i in 0{,+2}{,+3}{,+4}{,+7}; do
  y=$(( $i )); # evaluate expression
  if [ $y -eq 7 ]; then
    echo $i = $y;
  fi;
done

Outputs:

0+7 = 7
0+3+4 = 7
PP
+1  A: 

Use Algorithm::Combinatorics. That way, you can decide ahead of time what size subsets you want to consider and keep memory use to a minimum. Apply some heuristics to return early.

#!/usr/bin/perl

use strict; use warnings;
use List::Util qw( sum );
use Algorithm::Combinatorics qw( combinations );

my @x = (1 .. 10);
my $target_sum = 12;

{
    use integer;
    for my $n ( 1 .. @x ) {
        my $iter = combinations(\@x, $n);
        while ( my $set = $iter->next ) {
            print "@$set\n" if $target_sum == sum @$set;
        }
    }
}

The numbers do blow up fairly rapidly: It would take thousands of days to go through all subsets of a 40 element set. So, you should decide on the interesting sizes of subsets.

Sinan Ünür
The problem with this approach is that it has no way to short-circuit when the sum becomes too large. My algorithm can execute `Solve(44, [1 .. 40])` (a 40 element set) in less than a second on a Core2 Duo E6750.
cjm
@cjm good point.
Sinan Ünür
+3  A: 
sub Solve
{
  my ($goal, $elements) = @_;

  # For extra speed, you can remove this next line
  # if @$elements is guaranteed to be already sorted:
  $elements = [ sort { $a <=> $b } @$elements ];

  my (@results, $RecursiveSolve, $nextValue);

  $RecursiveSolve = sub {
    my ($currentGoal, $included, $index) = @_;

    for ( ; $index < @$elements; ++$index) {
      $nextValue = $elements->[$index];
      # Since elements are sorted, there's no point in trying a
      # non-final element unless it's less than goal/2:
      if ($currentGoal > 2 * $nextValue) {
        $RecursiveSolve->($currentGoal - $nextValue,
                          [ @$included, $nextValue ],
                          $index + 1);
      } else {
        push @results, [ @$included, $nextValue ]
            if $currentGoal == $nextValue;
        return if $nextValue >= $currentGoal;
      }
    } # end for
  }; # end $RecursiveSolve

  $RecursiveSolve->($goal, [], 0);
  return @results;
} # end Solve


my @results = Solve(7, [2,3,4,7]);
print "@$_\n" for @results;

This started as a fairly direct translation of the C# version from the question you linked, but I simplified it a bit (and now a bit more, and also removed some unnecessary variable allocations, added some optimizations based on the list of elements being sorted, and rearranged the conditions to be slightly more efficient).

I've also now added another significant optimization. When considering whether to try using an element that doesn't complete the sum, there's no point if the element is greater than or equal to half the current goal. (The next number we add will be even bigger.) Depending on the set you're trying, this can short-circuit quite a bit more. (You could also try adding the next element instead of multiplying by 2, but then you have to worry about running off the end of the list.)

cjm
Thanks! I just used this with a set of 52 numbers and got a result in under a minute.
itzy
@itzy, you're welcome. I've made one more minor improvement by moving the conditions around. When you're testing mutually-exclusive conditions, it's generally faster to try the one most likely to be true first.
cjm