views:

354

answers:

9

I'm working through an interview question that goes like:

Given an array of integers and sum, check whether any combination adds up to the sum.

What programming technique does one use when they want to try all possible combinations of a set?

Even if that isn't the best solution to this problem, I come across problems where I need to either generate or do something with all combinations of a list, and I'd like to know how to handle that.

+8  A: 

One handy insight is to realize that the binary representation of all numbers from 0 to (2^N)-1 is actually a set of bit masks for the possible combinations out of N distinct items. For instance, for N=3 (3 items) and thus (2^3)-1 = 7:

0: 000 = none
1: 001 = third item
2: 010 = second item
3: 011 = second and third items
4: 100 = first item
5: 101 = first and third items
6: 110 = first and second items
7: 111 = all 3 items

This makes it very easy to loop through all possible selections in a set order (so that it's impossible to skip or double-visit any potential selection).

Amber
A: 

Recursively. Pseudo-code would be something like this:

function f(set,currentelement,selectedelements,sum,wantedsum)
{
for (thiselement=currentelement+1 to lastelement)
   {
   if (sum+thiselement==wantedsum) print out selectedelements+thiselement;
   if (sum+thiselement<wantedsum)
      {
      f(set,thiselement,selectedelements+thiselement,sum+thiselement,wantedsum);
      }
   }
Patrick
+1  A: 

Some care with terminology is needed here. Combinations is used to refer to picking k items from a set of n items, where the order of the k items does not matter. The related concept of picking k items from a set of n items, where the order of the k items does matter, is referred to as a permutation.

What you initially talk about, however:

Given an array of integers and sum, check whether any combination adds up to the sum.

is a different thing - here there is no fixed k: you are interested in any size subset of the original items.

The set of all subsets of a set S is called the power-set of S, and there is a very simple formula for the number of members it contains. I will leave that as an exercise - once you have worked it out, it should be relatively obvious how to enumerate through the members of a set's powerset.

(Hint: the power-set of { 1, 2 } is { {}, { 1 }, { 2 }, { 1, 2 } })

AakashM
A: 

That sounds like a classic recursion problem. You start with the first element and consider the rest of the array; for each element, either it is picked or it isn't. The base case is when the start index is greater than the length of the array. Something like

public static bool canSum(int start, int[] array, int sum)
{
      if (start >= array.Length)
           return sum == 0;
      return canSum(start + 1, array, sum - array[start]) || canSum(start + 1, array, sum);
}
isbadawi
A: 

If you have positive as well as negative integers, you are going to run into a combinatorial explosion, where whatever algorithm you choose will slow by a fixed percentage for every increase in length of your array. (If you only have positive integers, you can bound your search once the target sum is exceeded.)

A boundary question: are you allowed to reuse integers as well?

You should search for 'combinatorial algorithms'. Knuths' tome-in-progress could help you quite a lot if you want to dig deeper into the question.

Pontus Gagge
A: 

I see two options:

  1. Compute the Power Set of the input array and check the sum of each element in the power set (see http://en.wikipedia.org/wiki/Power_set). This is probably O(2^N) and not good for large N
  2. Try something with the 0-1 Knapsack problem (see http://en.wikipedia.org/wiki/Knapsack_problem). This should either find the greatest sum less than your desired value, a sum that is your desired value, or not find anything. Based on the output, you can answer your original question. 0-1 Knapsack is nice because it runs in polynomial time O(N^c) where c is constant. I don't remember if it works for negative numbers though.
Eric Perko
+2  A: 

This doesn't answer your "combination" question, but it's probably the optimal solution to the Question :P

This is the subset sum problem problem where you have to search N sums.

Subset sum has a pseudo polynomial algorithm using dynamic programming:

psuedocode from this link

Subset-Sum-Solver[S = w1,w2, . . . ,wn,B]
1 Initialize M[0..n, 0..B] everywhere False apart from M[0, 0] = True
2 for i  from 1 to n
  do
3    for w from  0 to B
     do
4        M[i,w] = M[i − 1,w] _M[i − 1,w − wi]
         (any reference outside the array returns false)
5 Output M[n,B]

where B is the sum, S is the set of numbers, n is the cardinality of S (number of elements in S), and M is a n by B matrix . This algorithm is O(nB)

In the case of the interview question, do this for each sum, and you get an algorithm that's O(nmB) where m is the number of sums that you have to test.

The question is a little ambiguous, is the array of integers used to get subsets also the same array of sums? i.e. do a subset of integers in array A also add up to one of the integers in array A? in that case, then the algorithm is O(n^2B) since n == m

Charles Ma
This is definitely not optimal. Both in theory and in practice we can do better by using less memory and in practice we can do better by using a randomized algorithm.
IVlad
A: 

If you do choose to calculate a powerset, it can be done quite easily in a functional way.

In Haskell, there is a subsequences functions which essentially returns the powerset of any set, as a list of lists.

Or you can write it yourself

powerSet :: [a] -> [[a]]
powerSet [] = [[]]
powerSet x:xs = map (:x) (powerSet xs) ++ (powerSet xs)
MrBones
The order of the items in a set is irrelevant. To calculate the power set you don't want permutations, you want all combinations (of all possible lengths).
humble coffee
I think I actually meant List.subsequences - edited to reflect.
MrBones
+2  A: 

There are multiple ways of solving this problem. One is the classical DP solution which others have posted. I'm going to post a solution that uses only O(S) memory, where S is the sum of all integers in the array (can be changed to mean the desired sum too) and another that uses a very efficient randomized algorithm that can be tested to be very fast for even hundreds of thousands of numbers of any size, and even rational and negative numbers.

DP solution in O(nS) time and O(S) memory:

let F[i] = 1 if we can get sum i and 0 otherwise
F[0] = 1; // we can always make sum 0
for ( int i = 1; i <= n; ++i )
  for ( int j = S; j >= numbers[i]; --j )
    F[j] |= F[j - numbers[i]]; // basically, if F[j - numbers[i]] == 1, then we can add numbers[i] to make F[j] 1, otherwise we can't. A bitwise or operation will save us an if/else structure basically.

Pseudocode for randomized algorithm: Let Used = list of numbers that you sum. Let Unused = list of numbers that you DON'T sum. Let tmpsum = 0. Let S = desired sum you want to reach.

for ( each number x you read )
  toss a coin:
    if it's heads and tmpsum < S
      add x to Used
    else
      add x to Unused

while ( tmpsum != S )
  if tmpsum < S 
    MOVE one random number from Unused to Used
  else
    MOVE one random number from Used to Unused

print the Used list, containing the numbers you need to add to get S

This will be much faster than the dynamic programming solution, especially for random inputs. The only problems are that you cannot reliably detect when there is no solution (you could let the algorithm run for a few seconds and if it doesn't finish, assume there is no solution) and that you cannot be sure you will get the solution with minimum number of elements chosen. Again, you could add some logic to make the algorithm keep going and trying to find a solution with less elements until certain stop conditions are met, but this will make it slower. However, if you are only interested in a solution that works and you have a LOT of numbers and the desired sum can be VERY big, this is probably better than the DP algorithm.

Another advantage of this approach is that it will also work for negative and rational numbers with no modifications, which is not true for the DP solution, because the DP solution involves using partial sums as array indexes, and indexes can only be natural numbers. You can of course use hashtables for example, but that will make the DP solution even slower.

To generate all combinations, you should look up backtracking: http://en.wikipedia.org/wiki/Backtracking

For this problem, you need to use something like this:

void back(int k)
{
  if ( k > numElements ) // add all the nums[i] for which st[i] == 1 and check if their sum is what you desire, then return;

  for ( int i = 0; i <= 1; ++i )
  {
    st[k] = i;
    back(k + 1);
  }
}

You should run it on paper for small number of elements to see how it works. You can optimize it by calculating the sum as you go, thus avoiding the final summation. This is the general idea.

IVlad