views:

416

answers:

8

Not sure how best to explain it, other than using an example...

Imagine having a client with 10 outstanding invoices, and one day they provide you with a cheque, but do not tell you which invoices it's for.

What would be the best way to return all the possible combination of values which can produce the required total?


My current thinking is a kind of brute force method, which involves using a self-calling function that runs though all the possibilities (see current version).

For example, with 3 numbers, there are 15 ways to add them together:

  1. A
  2. A + B
  3. A + B + C
  4. A + C
  5. A + C + B
  6. B
  7. B + A
  8. B + A + C
  9. B + C
  10. B + C + A
  11. C
  12. C + A
  13. C + A + B
  14. C + B
  15. C + B + A

Which, if you remove the duplicates, give you 7 unique ways to add them together:

  1. A
  2. A + B
  3. A + B + C
  4. A + C
  5. B
  6. B + C
  7. C

However, this kind of falls apart after you have:

  • 15 numbers (32,767 possibilities / ~2 seconds to calculate)
  • 16 numbers (65,535 possibilities / ~6 seconds to calculate)
  • 17 numbers (131,071 possibilities / ~9 seconds to calculate)
  • 18 numbers (262,143 possibilities / ~20 seconds to calculate)

Where, I would like this function to handle at least 100 numbers.

So, any ideas on how to improve it? (in any language)

+1  A: 

Sounds like a bin packing problem. Those are NP-complete, i.e. it's nearly impossible to find a perfect solution for large problem sets. But you can get pretty close using heuristics, which are probably applicable to your problem even if it's not strictly a bin packing problem.

Michael Borgwardt
I think your right... and unfortunately, as it needs to be an exact match, I don't think heuristics will be the solution (unless an algorithm can be created which does eventually test all possibilities, but starts with the most likely).
Craig Francis
I think this is the knapsack problem, which is known to be NP-complete. (Problem is, I don't remember whether the knapsack problem is NP-complete with constant value, and the xkcd cartoon isn't solid proof.) You might want to Google "knapsack problem" or "integral knapsack problem".
David Thornley
+1  A: 

This is a variant on a similar problem.

But you can solve this by creating a counter with n bits. Where n is the amount of numbers. Then you count from 000 to 111 (n 1's) and for each number a 1 is equivalent to an available number:

001 = A
010 = B
011 = A+B
100 = C
101 = A+C
110 = B+C
111 = A+B+C

(But that was not the question, ah well I leave it as a target).

Gamecat
Its an interesting way to identify the calculations that need to be performed (can't believe I didn't spot that earlier)... which from a programming perspective means that I might be able to remove the self calling function, and use a simple "for" loop.
Craig Francis
+1  A: 

It's not strictly a bin packing problem. It's a what combination of values could have produced another value.

It's more like the change making problem, which has a bunch of papers detailing how to solve it. Google pointed me here: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.57.3243

MSN

MSN
I don't think its a change making problem, as the values are unknown, and don't necessarily go down to an atomic unit (which can be used many times)... for example: 2, 4, 5, 9, 15 can only hit the goal 11 with "2 + 4 + 5" or "2 + 9".
Craig Francis
+2  A: 

This is a pretty common variation of the subset sum problem, and it is indeed quite hard. The section on the Pseudo-polynomial time dynamic programming solution on the page linked is what you're after.

Bill the Lizard
I think this pretty much describes the problem (unfortunately my Google skills were failing me yesterday)... so the answer seems to be no, its not possible to solve perfectly in a realistic time-frame.
Craig Francis
Some problems are hard to Google if you haven't heard of them by name before. I've never implemented the DP solution, so I can't say how it would perform with 100 inputs. It might be worth a try.
Bill the Lizard
Actually, only the dynamic programming is hard to implement. The di-binary solution is *very* easy to implement (I've done it in SQL) and extremely efficient up to about 20+ values.
RBarryYoung
+2  A: 

This is strictly for the number of possibilities and does not consider overlap. I am unsure what you want.

Consider the states that any single value could be at one time - it could either be included or excluded. That is two different states so the number of different states for all n items will be 2^n. However there is one state that is not wanted; that state is when none of the numbers are included.

And thus, for any n numbers, the number of combinations is equal to 2^n-1.

def setNumbers(n): return 2**n-1

print(setNumbers(15))

These findings are very closely related to combinations and permutations.


Instead, though, I think you may be after telling whether given a set of values any combination of them sum to a value k. For this Bill the Lizard pointed you in the right direction.

Following from that, and bearing in mind I haven't read the whole Wikipedia article, I propose this algorithm in Python:

def combs(arr):
    r = set()

    for i in range(len(arr)):
        v = arr[i]
        new = set()

        new.add(v)
        for a in r: new.add(a+v)
        r |= new

    return r


def subsetSum(arr, val):
    middle = len(arr)//2

    seta = combs(arr[:middle])
    setb = combs(arr[middle:])

    for a in seta:
        if (val-a) in setb:
            return True

    return False

print(subsetSum([2, 3, 5, 8, 9], 8))

Basically the algorithm works as this:

  1. Splits the list into 2 lists of approximately half the length. [O(n)]
  2. Finds the set of subset sums. [O(2n/2 n)]
  3. Loops through the first set of up to 2floor(n/2)-1 values seeing if the another value in the second set would total to k. [O(2n/2 n)]

So I think overall it runs in O(2n/2 n) - still pretty slow but much better.

Thanks PythonPower, your absolutely right about this... I didn't notice until Gamecat made this observation as well (woops). So from a programming point of view, it might be possible to make the code more efficient (one loop?), but it will still have allot of permutations to calculate.
Craig Francis
A: 

I don't know how often it would work in practice as there are many exceptions to this oversimplified case, but here's a thought:

In a perfect world, the invoices are going to be paid up to a certain point. People will pay A, or A+B, or A+B+C, but not A+C - if they've received invoice C then they've received invoice B already. In the perfect world the problem is not to find to a combination, it's to find a point along a line.

Rather than brute forcing every combination of invoice totals, you could iterate through the outstanding invoices in order of date issued, and simply add each invoice amount to a running total which you compare with the target figure.

Back in the real world, it's a trivially quick check you can do before launching into the heavy number-crunching, or chasing them up. Any hits it gets are a bonus :)

jTresidder
A: 

I need to solve the same problem exactly, but in Delphi, what would be the algorithm? , I reviewed the one of php and the truth I did not understand very well

eddy_rocha
A: 

Here is an optimized Object-Oriented version of the exact integer solution to the Subset Sums problem(Horowitz, Sahni 1974). On my laptop (which is nothing special) this vb.net Class solves 1900 subset sums a second (for 20 items):

Option Explicit On

Public Class SubsetSum
    'Class to solve exact integer Subset Sum problems'
    ''
    ' 06-sep-09 RBarryYoung Created.'
    Dim Power2() As Integer = {1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32764}
    Public ForceMatch As Boolean
    Public watch As New Stopwatch
    Public w0 As Integer, w1 As Integer, w1a As Integer, w2 As Integer, w3 As Integer, w4 As Integer


    Public Function SolveMany(ByVal ItemCount As Integer, ByVal Range As Integer, ByVal Iterations As Integer) As Integer
        ' Solve many subset sum problems in sequence.'
        ''
        ' 06-sep-09 RBarryYoung Created.'
        Dim TotalFound As Integer
        Dim Items() As Integer
        ReDim Items(ItemCount - 1)

        'First create our list of selectable items:'
        Randomize()
        For item As Integer = 0 To Items.GetUpperBound(0)
            Items(item) = Rnd() * Range
        Next

        For iteration As Integer = 1 To Iterations
            Dim TargetSum As Integer
            If ForceMatch Then
                'Use a random value but make sure that it can be matched:'

                ' First, make a random bitmask to use:'
                Dim bits As Integer = Rnd() * (2 ^ (Items.GetUpperBound(0) + 1) - 1)

                ' Now enumerate the bits and match them to the Items:'
                Dim sum As Integer = 0
                For b As Integer = 0 To Items.GetUpperBound(0)
                    'build the sum from the corresponding items:'
                    If b < 16 Then
                        If Power2(b) = (bits And Power2(b)) Then
                            sum = sum + Items(b)
                        End If
                    Else
                        If Power2(b - 15) * Power2(15) = (bits And (Power2(b - 15) * Power2(15))) Then
                            sum = sum + Items(b)
                        End If
                    End If
                Next
                TargetSum = sum

            Else
                'Use a completely random Target Sum (low chance of matching): (Range / 2^ItemCount)'
                TargetSum = ((Rnd() * Range / 4) + Range * (3.0 / 8.0)) * ItemCount
            End If

            'Now see if there is a match'
            If SolveOne(TargetSum, ItemCount, Range, Items) Then TotalFound += 1
        Next

        Return TotalFound
    End Function

    Public Function SolveOne(ByVal TargetSum As Integer, ByVal ItemCount As Integer _
                            , ByVal Range As Integer, ByRef Items() As Integer) As Boolean
        ' Solve a single Subset Sum problem:  determine if the TargetSum can be made from'
        'the integer items.'

        'first split the items into two half-lists: [O(n)]'
        Dim H1() As Integer, H2() As Integer
        Dim hu1 As Integer, hu2 As Integer
        If ItemCount Mod 2 = 0 Then
            'even is easy:'
            hu1 = (ItemCount / 2) - 1 : hu2 = (ItemCount / 2) - 1
            ReDim H1((ItemCount / 2) - 1), H2((ItemCount / 2) - 1)
        Else
            'odd is a little harder, give the first half the extra item:'
            hu1 = ((ItemCount + 1) / 2) - 1 : hu2 = ((ItemCount - 1) / 2) - 1
            ReDim H1(hu1), H2(hu2)
        End If

        For i As Integer = 0 To ItemCount - 1 Step 2
            H1(i / 2) = Items(i)
            'make sure that H2 doesnt run over on the last item of an odd-numbered list:'
            If (i + 1) <= ItemCount - 1 Then
                H2(i / 2) = Items(i + 1)
            End If
        Next

        'Now generate all of the sums for each half-list:   [O( 2^(n/2) * n )]  **(this is the slowest step)'
        Dim S1() As Integer, S2() As Integer
        Dim sum1 As Integer, sum2 As Integer
        Dim su1 As Integer = 2 ^ (hu1 + 1) - 1, su2 As Integer = 2 ^ (hu2 + 1) - 1
        ReDim S1(su1), S2(su2)

        For i As Integer = 0 To su1
            ' Use the binary bitmask of our enumerator(i) to select items to use in our candidate sums:'
            sum1 = 0 : sum2 = 0
            For b As Integer = 0 To hu1
                If 0 < (i And Power2(b)) Then
                    sum1 += H1(b)
                    If i <= su2 Then sum2 += H2(b)
                End If
            Next
            S1(i) = sum1
            If i <= su2 Then S2(i) = sum2
        Next

        'Sort both lists:   [O( 2^(n/2) * n )]  **(this is the 2nd slowest step)'
        Array.Sort(S1)
        Array.Sort(S2)

        ' Start the first half-sums from lowest to highest,'
        'and the second half sums from highest to lowest.'
        Dim i1 As Integer = 0, i2 As Integer = su2

        ' Now do a merge-match on the lists (but reversing S2) and looking to '
        'match their sum to the target sum:     [O( 2^(n/2) )]'
        Dim sum As Integer
        Do While i1 <= su1 And i2 >= 0
            sum = S1(i1) + S2(i2)
            If sum < TargetSum Then
                'if the Sum is too low, then we need to increase the ascending side (S1):'
                i1 += 1
            ElseIf sum > TargetSum Then
                'if the Sum is too high, then we need to decrease the descending side (S2):'
                i2 -= 1
            Else
                'Sums match:'
                Return True
            End If
        Loop

        'if we got here, then there are no matches to the TargetSum'
        Return False
    End Function

End Class

Here is the Forms code to go along with it:

Public Class frmSubsetSum

    Dim ssm As New SubsetSum

    Private Sub btnGo_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnGo.Click
        Dim Total As Integer
        Dim datStart As Date, datEnd As Date
        Dim Iterations As Integer, Range As Integer, NumberCount As Integer

        Iterations = CInt(txtIterations.Text)
        Range = CInt(txtRange.Text)
        NumberCount = CInt(txtNumberCount.Text)

        ssm.ForceMatch = chkForceMatch.Checked

        datStart = Now

        Total = ssm.SolveMany(NumberCount, Range, Iterations)

        datEnd = Now()

        lblStart.Text = datStart.TimeOfDay.ToString
        lblEnd.Text = datEnd.TimeOfDay.ToString
        lblRate.Text = Format(Iterations / (datEnd - datStart).TotalMilliseconds * 1000, "####0.0")

        ListBox1.Items.Insert(0, "Found " & Total.ToString & " Matches out of " & Iterations.ToString & " tries.")
        ListBox1.Items.Insert(1, "Tics 0:" & ssm.w0 _
                                & " 1:" & Format(ssm.w1 - ssm.w0, "###,###,##0") _
                                & " 1a:" & Format(ssm.w1a - ssm.w1, "###,###,##0") _
                                & " 2:" & Format(ssm.w2 - ssm.w1a, "###,###,##0") _
                                & " 3:" & Format(ssm.w3 - ssm.w2, "###,###,##0") _
                                & " 4:" & Format(ssm.w4 - ssm.w3, "###,###,##0") _
                                & ", tics/sec = " & Stopwatch.Frequency)
    End Sub
End Class

Let me know if you have any questions.

RBarryYoung