views:

359

answers:

4

Suppose you have the following list of numbers, {3,6,10,9,13,16,19}, not necessarily in that order. Now, not knowing that this is the set of the possible combination of the set {3,6,10}, is there an algorithm, in any programming language that can be used to find these combination efficiently. Basically, I want to recover the list from the total set - where all numbers are included. What is an efficient algorithm, I do not wish to reinvent the wheel if one already exists?

+4  A: 

For the general case where there can be any number of elements, here is an O(q * log(q)) algorithm where q is the size of the input list:

  1. Sort the list q in ascending order.
  2. Remove the smallest element, m, and add it to the result set. Remove it from q.
  3. Iterate over q. Keep a list of numbers that we've seen. If we see a number that is (a number we've already seen + m) then discard it. This should keep half the numbers (all those that don't involve m).
  4. Repeat from step 2 until all numbers are found.

Here's an implementation of this algorithm in Python:

def solve(q):
    q = sorted(q)
    x = []
    while q:
        x.append(q[0])

        s = [False]*len(q)
        s[0] = True
        j = 1

        for i in range(1, len(q)):
            if q[i] == q[0] + q[j]:
                s[i] = True
                j += 1
                while j < len(q) and s[j]:
                    j += 1

        q = [k for k, v in zip(q, s) if not v]
    return x

s = [1, 3, 7, 12, 13, 20, 25, 31, 32, 33, 62, 78, 80, 92, 99]
from itertools import combinations
q = list(sum(x) for r in range(1, len(s) + 1) for x in combinations(s, r))
print(solve(q))

Result:

[1, 3, 7, 12, 13, 20, 25, 31, 32, 33, 62, 78, 80, 92, 99]

Original answer:

Assuming that there are only 3 numbers in the list, and that no number can be negative:

Two of the numbers must be the smallest two numbers in the list. The largest number must be the sum of all three. By subtraction you can find the third number.

Mark Byers
Yes, that is a simple approach. However, I am assuming that I do not know how many numbers make up the base list. This is actually from data I have collected and averaged and sorted. All the numbers in the set are unique, that is, (3+3 = 6 is not allowed.) My actual set has more the 30 different clusters so that approach is implausible.
Programming Enthusiast
What about negative numbers? Is that a possibility? Why wouldn't you know how many numbers are in the base list? You just have to find a number such that 2**n - 1 == size of set.
Mark Byers
Indeed, in the best case, where I have accounted for all the combinations, it would be 2^n -1. Negative numbers are not allowed.
Programming Enthusiast
If the result has over 30 elements, then the input should have over 2**30 elements. This is a huge number!
Mark Byers
Well, like you said, there are elements in the set that are sums of 2 lower numbers and the are unique sums of another number themselves. My input is huge but not that huge and that is what must be going on. Thanks
Programming Enthusiast
12+1 = 13, 7+13 = 20, etc.
BlueRaja - Danny Pflughoeft
After a lot of thought, I was able to find a simple a correlation between the two sets that I had completely overlooked. In the bigger set, each 2 to the nth element makes the set subset of the lower set. Using this, I am now working on being able to recover a set from a partial set.
Programming Enthusiast
Programming Enthusiast: Are you sure that works? What about if the input is [3,4,5,7,8,9,12]? The output should be [3,4,5].
Mark Byers
In the case, the following simple check would suffice. I will check if the sum of the first two elements is equal to the 2 to the 3rd element, the I will go with the third, and so forth. I will add this and test. Thanks for your input.
Programming Enthusiast
+4  A: 

1) Find the smallest two numbers, these must be a part of the original list.

2) Find their sum, everything smaller in the list must be part of the original list.

3) Find the next smallest sum and repeat until all sums of two numbers are done.

Any time you add a number to the original list or find a sum, remove it from the big list.

4) Continue with 3 number sums, and keep increasing until the big list is empty.

Edit:

Finding the next smallest sum requires a sorted list of your numbers. If your list is A,B,C,D,E then then smallest sum is A+B and the next smallest sum is A+C.

The performance is as terrible as it gets: 2^N, but if I'm reading your question right, the list contains your original list and all possible sums, which would allow you to increase performance considerably.

For instance, you know how many numbers you are looking for, which means you know when you only need one more, and since you also know the largest number in the list, the last number is the largest number minus all numbers already added to your original list.

David Kanarek
I'm not sure how this works... Can you explain step 3 in more detail please? How do you find the next smallest sum?
Mark Byers
And what is the performance of this algorithm in O(n) notation?
Mark Byers
A: 

This is how you do it. Or at least its a naive solution.

Firstly you order the numbers in ascending order. Assuming A is the ordered result list and S is the set of minimal numbers from which you can construct A.

Loop through A. While there doesn't exist a subset of S that adds up to ai add a new number to S such that it does.

On the first iteration this will add min(A). The second number is probably going to be in S. This is quite computationally intensive because for each number you examine in A you will need to make sure that there exists a subset of S that adds to it and that you don't add a number that creates a subset of S that adds to something in A.

You can optimize this somewhat by, each time you add a number to S, you work out all possible sums including that new element and removing them from A. Keep going until you empty A.

It gets complicated if the numbers can be negative but you'll see this because there'll have to be a negative element of A for this to be possible.

cletus
Naively, if S is of size n, you must check 2**n possible sums to see if a number is the sum of some subset of this. You can optimize this substantially however.
GregS
A: 

As GMan has pointed out, this seems at least related to the subset sum problem. Various algorithms suggested here involve asking if a given integer x in the big list X is a sum of some subset of integers in the "current" spanning set S, and if not, then add x to S and remove it from X and continue. It seems like each of these steps is an instance of the subset sum problem, but I cannot prove it.

GregS