ansaurus

Question

Algorithm for fair distribution of numbers into two sets.

Answer 1

+2 A:

If the numbers are sequential then you just alternate assigning them between A and B.

I suspect they are not, in which case...

Assign the largest unassigned number to the group with the lowest sum unless the difference in size of the the groups is less than or equal to count of unassigned numbers (in which case assign all of the remaining numbers to smaller group).

It won't find the best solution in all cases, but its close and simple.

RichH 2009-10-02 05:24:32

Unfortunately, alternation doesn't quite meet the specified goals. Imagine the sequence 1, 2, 3: assigning numbers alternately would give you A = { 1, 3 } and B = { 2 }, but a more fair distribution would be A = { 1, 2 } and B = { 3 }

C Pirate 2009-10-02 05:32:06

I did say it wouldn't find the best solution, but it is a really simple way to find a solution that is close.

RichH 2009-10-04 19:40:56

Answer 2

+2 A:

Never mind, I thought the numbers were sequential. This looks kind of like the Knapsack Problem, which is NP hard.

The numbers are sequential?

Put the largest number in A
Put the next largest number in B
Put the next largest number in B
Put the next largest number in A
Repeat step 1 until all the numbers are assigned.

Proof:

After every multiple of 4 numbers has been assigned, A and B both contain the same number of items, and the sum of the items in each group are the same because

(n) + (n - 3) == (n - 1) + (n - 2)

In the last iteration we are at Step 1 above and we have either 0, 1 [1], 2 [1,2], or 3 [1,2,3] numbers remaining.

In case 0, we are done and the groups are equal in count and weight.

In case 1, we assign the number 1 to group A. Group A has one more item and one more weight. This is as fair as we can get in this situation.

In case 2, we assign the number 2 to group A and the number 1 to group B. Now the groups have the same number of items and group A has one extra weight. Again, this is as fair as we can get.

In case 3, assign the number 3 to group A, and assign numbers 2 and 1 to group B. Now the groups have the same weight (3 == 2 + 1) and group B has one extra item.

mobrule 2009-10-02 05:29:16

Though the knapsack problem is definitely NP hard, this problem (with constraints on input) can be solved in polynomic time.

Olexiy 2009-10-02 10:35:46

This does not work in all cases try with 10, 15, 20, 25, 30, 45, 75. The result is A=75, 25, 20 and B = 45, 30, 15, 10Sum in A = 120, Sum in B = 100 whereas it is possible to have 110 in each

Patrice Bernassola 2009-10-02 11:16:18

Answer 3

+1 A:

your requirement in #2 needs clarification, because: "The sum of all the numbers in A is as nearly equal as possible to the sum of all the numbers in B" is clear, but then your statement "the distribution should be fair" makes everything unclear. What does 'fair' exactly mean? Does the process need a random element in it?

Jon Schoning 2009-10-02 05:29:43

Thanks for your reply Jon. By fair, i mean the same thing, i.e. the sum of numbers in each set should be as nearly equal as possible, and the total numbers in each set differ by atmost 1.

2009-10-02 05:35:46

Answer 4

A:

I assume the numbers are not sequential, and you can't re-balance?

Because of constraint 1, you're going to need to switch buckets every other insertion, always. So every time you're not forced to pick a bucket, pick a logical bucket (where adding the number would make the sum closer to the other bucket). If this bucket isn't the same one as your previous bucket, you get another turn where you're not forced.

psychotik 2009-10-02 05:31:14

Answer 5

A:

Any dual knapsack algorithm will do (regardless of distribution of numbers).

Esteban Araya 2009-10-02 05:33:22

An example or references would be great! :-)

fortran 2009-10-02 10:23:41

Answer 6

+1 A:

@ShreevatsaR notes that the algorithm below is known as the greedy algorithm. It does not do very well with certain inputs (I tried 10 different sets of randomly generated sets of inputs of size 100 and in all cases, the sums were very close which led me to think sorting the input was enough for the success of this algorithm).

See also "The Easiest Hard Problem", American Scientist, March-April 2002, recommended by ShreevatsaR.

#!/usr/bin/perl

use strict;
use warnings;

use List::Util qw( sum );

my @numbers = generate_list();

print "@numbers\n\n";

my (@A, @B);
my $N = @numbers;

while ( @numbers ) {
    my $n = pop @numbers;
    printf "Step: %d\n", $N - @numbers;
    {
        no warnings 'uninitialized';

        if ( sum(@A) < sum(@B) ) {
            push @A, $n;
        }
        else {
            push @B, $n;
        }
        printf "A: %s\n\tsum: %d\n\tnum elements: %d\n",
            "@A", sum(@A), scalar @A;
        printf "B: %s\n\tsum: %d\n\tnum elements: %d\n\n",
            "@B", sum(@B), scalar @B;
    }
}

sub generate_list { grep { rand > 0.8 } 1 .. 450 }

Note that generate_list returns a list in ascending order.

Sinan Ünür 2009-10-02 05:34:27

This algorithm doesn't guarantee the second constraint as well - for example, for the list {1, 1, 3, 3, 4} it will produce smth like {1, 3, 4}-{1, 3} instead of the correct answer {1, 1, 4} - {3, 3}

Olexiy 2009-10-02 10:27:46

@Olexiy: You should try running the code before downvoting. You seem to have missed the fact that I am `pop`ing from the main list.

Sinan Ünür 2009-10-02 10:50:15

I missed the fact that you pop from the end of the list - try {4, 3, 3, 1, 1} now

Olexiy 2009-10-02 14:29:01

@Olexiy How is it not clear that I am assuming that the input is sorted?

Sinan Ünür 2009-10-02 14:59:23

Use {2, 2, 2, 3, 3}. The optimal solution is to split into two sets {2, 2, 2} and {3, 3} both equal to 6, but your code (which I ran) produces {3, 2} and {3, 2, 2}, which are 5 and 7. Your idea is the differencing heuristic for the number-partitioning problem, but although it works for many cases, it is known that it can be very far from optimal for bad cases.

ShreevatsaR 2009-10-02 15:31:15

Correction: Your idea is known as the greedy algorithm; "differencing" is a different idea due to Karmarkar and Karp (which does better, but is still not optimal). Further reading: Brian Hayes's article "The Easiest Hard Problem", *American Scientist*, March-April 2002. http://www.americanscientist.org/issues/pub/the-easiest-hard-problem/

ShreevatsaR 2009-10-02 15:35:42

@ShreevatsaR Thanks for the constructive input and the link to a great article. Just out of curiosity, can you think of a counter-example where all numbers are unique?

Sinan Ünür 2009-10-02 15:41:33

Yes, the same idea works: modify the example (20, 20, 20, 30, 30) slightly to get the new example (19, 20, 21, 29, 31). :-) (Again the optimal is {19, 20, 21} and {29, 31} with 60 each, but this algorithm gives ~50 and ~70... 51 and 69, I think.)

ShreevatsaR 2009-10-02 15:46:56

@ShreevatsaR Got it. I should have seen it. (minor correction, 52 and 68).

Sinan Ünür 2009-10-02 16:05:45

@ Sinan Ünür: The input is not sorted, your program does not sort it, and I can not read your thoughts - how can I guess that you sort it?

Olexiy 2009-10-02 17:13:53

@Olexiy `sub generate_list { grep { rand > 0.8 } 1 .. 450 }` this will select each number from `1` to `450` with probability `0.2`. The list will be sorted because `1` thru `450` is sorted.

Sinan Ünür 2009-10-02 18:51:23

Ok, sorry, didn't see that part

Olexiy 2009-10-03 08:38:37

Answer 7

+3 A:

This is a constrained version of the Number Partioning Problem. Usually the goal is to find any 2 disjoint subsets that minimize the difference of the sums. Your problem is constrained in the sense you only consider 1 possiblity: 2 sets of size N/2 (or 1 set of N/2 and one set of N/2+1 if the total number if uneven). This dramatically reduces the search space, but I can't thnk of a good algorithm at the moment, I'll think about it.

Falaina 2009-10-02 05:51:07

My calculation shows that there are 1.00891344545564e+29 ways to partition 100 numbers into 2 sets of 50 (the worst case) so any brute force approach is still going to fail! Still as your link says number partitioning is seen as the easiest of the NP complete problems!

Jackson 2009-10-02 09:06:59

The Number Partitioning Problem (or the knapsack problem, or the subset-sum problem) is NP-complete only when you define the input size as the "length" of the input, i.e., the number of bits needed to describe the numbers. But there are easy dynamic-programming algorithms that run in time polynomial in the *numbers* themselves (here 1 to 100); so the fact that it is NP-complete shouldn't be a deterrent. (In fact, these problems undergo a "phase transition" when the sizes of the numbers go from polynomial to exponential in n... but 100 is a very small number, so that's of no concern here.)

ShreevatsaR 2009-10-02 15:21:43

In short: constrained versions of NP-hard problems aren't necessarily hard. Far from it. :-)

ShreevatsaR 2009-10-02 15:37:16

Answer 8

+8 A:

Since the numbers are small it is not NP-complete.

To solve it you can use dynamic programming:

Make a three-dimensional table of booleans where true at t[s, n, i] means that the sum s can be reached with a subset of n elements below index i. To compute the value for t[s, n, i] check t[s, n, i-1] and t[s - a[i], n-1, i-1]. Then look through the table at second index n/2 to find the best solution.

Edit: You actually don't need the complete table at once. You can make a two dimensional table t_i[s, n] for each index i and compute the table for i from the table for i-1, so you only need two of these two-dimensional tables, which saves a lot of memory. (Thanks to Martin Hock.)

starblue 2009-10-02 09:10:00

Olexiy 2009-10-02 10:20:29

A problem is NP-complete regardless the size of the input in this case! :-p Another question is if this instance of the problem is feasible given its size

fortran 2009-10-02 10:38:12

The constraint for input values (each of them <= 450) makes the problem solvable in a polynomical time. The problem can be stupidly brute-forced in O(N^450) (which is clearly polynomical).A quick N^3 solution is described in another comment.

Olexiy 2009-10-02 10:39:02

@fortran: NO! NP-completeness (or even notions like "polynomial time") are defined in terms of the input size. The DP algorithm described here is perfectly correct; its running time is a polynomial function of the numbers in the input (here 1 to 100). The only reason this is not a polynomial-time algorithm for the general Number Partitioning Problem is that *there*, the "input size" is the *number of bits* to describe the input, so the log of the numbers in the input.

ShreevatsaR 2009-10-02 15:11:28

+1 Ah, nice solution I knew there had to be a descent psedupolynomial solution with DP, but I couldn't think of the table. Thanks for the insight.

Falaina 2009-10-02 19:11:18

You can also do it with a 2D table. t[s, k] true means that the sum s can be reached with some subset of k elements. Your array should be of size [S, ceil(n/2)] where S is the sum of all elements and n is the number of elements. Then iterate through the elements i[1] ... i[n]. For each element i[x], scan through the table and if t[s, n] is true, set t[s+i[x], n+1] true (unless this would fall off the table). Also, set t[i[x], 1] true. Finally, scan through the last column (all x for [x, ceil(n/2)]) for the optimal value.

Martin Hock 2009-10-07 02:33:40

Answer 9

+2 A:

First, find a solution to the problem without the first constraint (i.e. - making sums as close as possible). This problem can be solved using DP approach (you can read more about DP here, and the first problem - about coins - is very similar to yours).

Once you can solve it, you can add one more state to your DP - the number of persons selected to the subset already. This gives you a N^3 algorithm.

Olexiy 2009-10-02 10:16:34

This is correct, but a minor point: the algorithm is (N^2*K) where K is the maximum size of a subset (here 100*N), not necessarily N^3 in general.

ShreevatsaR 2009-10-02 15:40:23

450*N.Yes, sure, but without that constraint the problem is NP-complete

Olexiy 2009-10-02 17:11:39

Yeah, I know. Although 450 is a "constant", it is really more informative to treat it as a parameter and say that the algorithm is O(N^3*K), rather than O(N^3). :-)

ShreevatsaR 2009-10-03 21:07:42

Answer 10

A:

I would give genetic algorithms a try, as this seems a very nice problem to apply them.

The codification is just a binary string of length N, meaning 0 being in the first group and 1 in the second group. Give a negative fitness when the number of elements in each group differs, and a positive fitness when the sums are similar... Something like:

fitness(gen) = (sum(gen)-n/2))^2 + (sum(values[i]*(-1)**gen[i] for i in 0..n))^2

(And minimize the fitness)

Of course this can give you a sub-optimal answer, but for large real world problems it's usually enough.

fortran 2009-10-02 10:32:04

Answer 11

+2 A:

I have an algorithm for you. It is using a lot of recursive and iterative concepts.

Assuming you have n number Xn with 1 <= n <= 100 and 1 <= Xn <= 450.

If n < 3 then distribute numbers and stop algorithm,
If n > 2 then sort your list of number in ascending order,
Compute the total sum S of all numbers,
Then divide the previous total S by (n - n%2)/2 and obtain the A value,
Now we will create couple of numbers which addition will be as near as possible as A. Get the first number and find a second number in order to obtain a sum S1 as near as possible than A. Put S1 in a new list of number and keep in memory how the sum was computed in order to have the base numbers later.
Execute 5. until numbers in the list is < 2. Then put the remaining numbers to the sum list and restart algorithm to point 1. with new list.

Example:

Assuming: n = 7 and numbers are 10, 75, 30, 45, 25, 15, 20

Pass 1:

Since n > 2 so sort the list : 10, 15, 20, 25, 30, 45, 75

Sum S = 220

A = 220 / ((7-1)/2) = 73

Couples:

10 & 75 => 85

15 & 45 => 60

20 & 30 => 50

Remaining numbers are < 2 so add 25 in the sum list : 85(10,75), 60(15,45), 50(20,30), 25(25)

Pass 2:

n = 4 and numbers are 85, 60, 50, 25

List count is > 2 so sort list : 25(25), 50(20,30), 60(15,45), 85(10,75)

Sum S is still the same (S=220) but A must be recompute : A = 220 / ((4-0)/2) = 110

Couples:

25 & 85 => 110

50 & 60 => 110

The Sum list is : 110(25(25),85(10,75)), 110(50(20,30),60(15,45))

Pass 3:

n = 2 and numbers are 110, 110

n < 3 so distribute numbers:

A = 25, 10, 75

B = 20, 30, 15, 45

This works on each scenario I have tested.

Patrice Bernassola 2009-10-02 10:37:09

This is a good attempt, but it cannot possibly work for all inputs -- because if this were correct, it would be possible to extend it and get a polynomial-time algorithm for the NP-complete number partitioning problem. This is bound to fail, but I cannot think of a counterexample right now. Do you have some code for this? Try with larger numbers (e.g. with a set of 10 numbers close to 450 that has exactly one good partition).

ShreevatsaR 2009-10-03 20:55:00

Actually, the same counterexample that I gave for another solution works: take (19, 20, 21, 29, 31). The sum is 120, so your algorithm will try to find two pairs close to 120/((5-1)/2)=60 each: (19,31) and (20,29). It will then proceed with the list 21(21), 49(20,29), 50(19,31). Next it tries to find pairs with sum 120/((3-1)/2)=120: (21,50(19,31)). So it ends with partitions 71(21,19,31) and 49(20,29), but the optimal partition is 60(29,31) and 60(19,20,21). Your algorithm actually works for this example if the "pairing" is done from the largest number first, but that cannot be generalized :)

ShreevatsaR 2009-10-03 23:13:36

Answer 12

A:

Simulated Annealing can quite quickly find better and better answers. You could keep 1. true while improving the nearness of 2.

Stephen Denne 2009-10-02 10:58:27

Answer 13

A:

If you need the perfect answer then you have to generate and loop through all of the possible sets of answers. If a pretty good answer is all you need then a technique like simulated annealing is the way to go. Heres some C code that uses a very primitive cooling schedule to find an answer.

#include <stdio.h>
#include <stdlib.h>

#define MAXPAR 50
#define MAXTRIES 10000000

int data1[] = {192,130,446,328,40,174,218,31,59,234,26,365,253,11,198,98,
               279,6,276,72,219,15,192,289,289,191,244,62,443,431,363,10
              } ;
int data2[] = { 1,2,3,4,5,6,7,8,9 } ;

// What does the set sum to
int sumSet ( int data[], int len )
{
    int result = 0 ;
    for ( int i=0; i < len; ++i )
     result += data[i] ;
    return result ;
}

// Print out a set
void printSet ( int data[], int len )
{
    for ( int i=0; i < len; ++i )
     printf ( "%d ", data[i] ) ;
    printf ( " Sums to %d\n", sumSet ( data,len ) ) ;
}

// Partition the values using simulated annealing
void partition ( int data[], size_t len )
{
    int set1[MAXPAR] = {0} ; // Parttition 1
    int set2[MAXPAR] = {0} ; // Parttition 2
    int set1Pos, set2Pos, dataPos, set1Len, set2Len ;  // Data about the partitions
    int minDiff ; // The best solution found so far
    int sum1, sum2, diff ;
    int tries = MAXTRIES ; // Don't loop for ever

    set1Len = set2Len = -1 ;
    dataPos = 0 ;

    // Initialize the two partitions
    while ( dataPos < len )
    {
     set1[++set1Len] = data[dataPos++] ;
     if ( dataPos < len )
      set2[++set2Len] = data[dataPos++] ;
    }


    // Very primitive simulated annealing solution
    sum1 = sumSet ( set1, set1Len ) ;
    sum2 = sumSet ( set2, set2Len ) ;
    diff = sum1 - sum2 ; // The initial difference - we want to minimize this
    minDiff = sum1 + sum2 ;
    printf ( "Initial diff is %d\n", diff ) ;

    // Loop until a solution is found or all are tries are exhausted
    while ( diff != 0 && tries > 0 )
    {
     // Look for swaps that improves the difference
     int newDiff, newSum1, newSum2 ;
     set1Pos = rand() % set1Len ;
     set2Pos = rand() % set2Len ;

     newSum1 = sum1 - set1[set1Pos] + set2[set2Pos] ;
     newSum2 = sum2 + set1[set1Pos] - set2[set2Pos] ;
     newDiff = newSum1 - newSum2 ;
     if ( abs ( newDiff ) < abs ( diff ) ||  // Is this a better solution?
             tries/100 > rand() % MAXTRIES )  // Or shall we just swap anyway - chance of swap decreases as tries reduces
     {
      int tmp = set1[set1Pos] ;
      set1[set1Pos] = set2[set2Pos] ;
      set2[set2Pos] = tmp ;
      diff = newDiff ;
      sum1 = newSum1 ;
      sum2 = newSum2 ;

      // Print it out if its the best we have seen so far
      if ( abs ( diff ) < abs ( minDiff ) )
      {
       minDiff = diff ;
       printSet ( set1, set1Len ) ;
       printSet ( set2, set2Len ) ;
       printf ( "diff of %d\n\n", abs ( diff ) ) ;
      }
     }


     --tries ;
    }

    printf ( "done\n" ) ;
}


int main ( int argc, char **argv )
{
    // Change this to init rand from the clock say if you don't want the same
    // results repoduced evert time!
    srand ( 12345 ) ;
    partition ( data1, 31 ) ;
    partition ( data2, 9 ) ;
    return 0;
}

Jackson 2009-10-02 11:43:03

Hahaha, solve NP problem with PERL! I've seen everything now! ;-)

Eric M 2009-10-02 15:14:13

ansaurus

tags:

views:

answers:

Algorithm for fair distribution of numbers into two sets.

related questions