views:

2589

answers:

14

There is a list of numbers.

The list is to be divided into 2 equal sized lists, with a minimal difference in sum. The sums have to be printed.

#Example:
>>>que = [2,3,10,5,8,9,7,3,5,2]
>>>make_teams(que)
27 27

Is there an error in the following code algorithm for some case?

How do I optimize and/or pythonize this?

def make_teams(que):
    que.sort()
    if len(que)%2: que.insert(0,0)
    t1,t2 = [],[]
    while que:
 val = (que.pop(), que.pop())
 if sum(t1)>sum(t2):
     t2.append(val[0])
     t1.append(val[1])
 else:
     t1.append(val[0])
     t2.append(val[1])
    print min(sum(t1),sum(t2)), max(sum(t1),sum(t2)), "\n"

Question is from http://www.codechef.com/problems/TEAMSEL/

+2  A: 

Well, you can find a solution to a percentage precision in polynomial time, but to actually find the optimal (absolute minimal difference) solution, the problem is NP-complete. This means that there is no polynomial time solution to the problem. As a result, even with a relatively small list of numbers, it is too compute intensive to solve. If you really need a solution, take a look at some of the approximation algorithms for this.

http://en.wikipedia.org/wiki/Subset_sum_problem

Sean Turner
This is wrong. It's the knapsack problem which can be solved with dynamic programming.
Georg
I'm thinking this isn't a Subset sum problem...though I'll freely admit that I've been away from this arena too long to say so with any assurance. I'm liking the dynamic programming approach outlined by GS. Can you explain why that wouldn't work?
Beska
@gs: It's not wrong. You can look at it as a subset-sum problem or a knapsack problem (technically, it's called the number partitioning problem), because all NP-complete problems are equivalent anyway. :-) And this problem illustrates why it's important not to get carried away by the term "NP-complete": there is no algorithm known that is polynomial in the *input size* (number of bits in the input) in the worst case, but as the dynamic programming algorithm shows, it can be done in time polynomial in the input *numbers* themselves. It's the same with knapsack: look up 'Pseudo-polynomial time'.
ShreevatsaR
A: 
class Team(object):
    def __init__(self):
     self.members = []
     self.total = 0

    def add(self, m):
     self.members.append(m)
     self.total += m

    def __cmp__(self, other):
     return cmp(self.total, other.total)


def make_teams(ns):
    ns.sort(reverse = True)
    t1, t2 = Team(), Team()

    for n in ns:
     t = t1 if t1 < t2 else t2
     t.add(n)

    return t1, t2

if __name__ == "__main__":
    import sys
    t1, t2 = make_teams([int(s) for s in sys.argv[1:]])
    print t1.members, sum(t1.members)
    print t2.members, sum(t2.members)

>python two_piles.py 1 50 50 100
[50, 50] 100
[100, 1] 101
Aaron Maenpaa
That looks a lot too complex to me.
odwl
+20  A: 

Dynamic programming is the solution you're looking for.

Example with {3,4,10,3,2,5}

X-Axis: Reachable sum of group.        max = sum(all numbers) / 2    (rounded up)
Y-Axis: Count elements in group.       max = count numbers / 2       (rounded up)

      1  2  3  4  5  6  7  8  9 10 11 12 13 14
 1  |  |  |  | 4|  |  |  |  |  |  |  |  |  |  |       //  4
 2  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
 3  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
      1  2  3  4  5  6  7  8  9 10 11 12 13 14
 1  |  |  | 3| 4|  |  |  |  |  |  |  |  |  |  |       //  3
 2  |  |  |  |  |  |  | 3|  |  |  |  |  |  |  |
 3  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
      1  2  3  4  5  6  7  8  9 10 11 12 13 14
 1  |  |  | 3| 4|  |  |  |  |  |10|  |  |  |  |       // 10
 2  |  |  |  |  |  |  | 3|  |  |  |  |  |10|10|
 3  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
      1  2  3  4  5  6  7  8  9 10 11 12 13 14
 1  |  |  | 3| 4|  |  |  |  |  |10|  |  |  |  |       //  3
 2  |  |  |  |  |  | 3| 3|  |  |  |  |  |10|10|
 3  |  |  |  |  |  |  |  |  |  | 3|  |  |  |  |
      1  2  3  4  5  6  7  8  9 10 11 12 13 14
 1  |  | 2| 3| 4|  |  |  |  |  |10|  |  |  |  |       //  2
 2  |  |  |  |  | 2| 3| 3|  |  |  |  | 2|10|10|
 3  |  |  |  |  |  |  |  | 2| 2| 3|  |  |  |  |
      1  2  3  4  5  6  7  8  9 10 11 12 13 14
 1  |  | 2| 3| 4| 5|  |  |  |  |10|  |  |  |  |       //  5
 2  |  |  |  |  | 2| 3| 3| 5| 5|  |  | 2|10|10|
 3  |  |  |  |  |  |  |  | 2| 2| 3| 5| 5|  |  |
                                       ^

12 is our lucky number! Backtracing to get the group:

12 - 5 = 7        {5}
 7 - 3 = 4        {5, 3}
 4 - 4 = 0        {5, 3, 4}

The other set can then be calculated: {4,3,10,3,2,5} - {5,3,4} = {10,3,2}

All fields with a number are possible solutions for one bag. Choose the one that is furthest in the bottom right corner.

BTW: It's called the knapsack-problem.

If all weights (w1, ..., wn and W) are nonnegative integers, the knapsack problem can be solved in pseudo-polynomial time using dynamic programming.

Georg
Okay. This makes sense to me...but then why are people claiming that this problem is NP-complete? Someone is wrong...and I can't figure out what (if anything) is wrong with this solution.
Beska
You would need a space of O(Sum(x[i]) in order to use the dynamic programming solution. In general, I believe that the problem is NP-Complete. (Think if each number is a floating point number, you can not easily use Dynamic programming)
Nathan Wong
That's true, it works only for a limited subset of the problem.
Georg
Pseudo-polynomial time (http://en.wikipedia.org/wiki/Pseudo-polynomial_time) means the time is polynomial in the size of the input numbers, but still non-polynomial in the length of the input. If the input number size is bounded, then you have a polynomial time algorithm. But if it's unbounded, then not. For example, if the n input numbers for knapsack are 2^0, 2^1, ..., 2^(n-1), then you have 2^n solutions to examine in the last step of the dynamic programming solution.
dewtell
Why is everyone voting this up? This solution doesn't even make sense. The closest to sum/2 is either 13 or 14 which, according to this answer, is valid. It is not possible to make 13 or 14 from that list without having team of 2 and 4 (which is not allowed in the problem description).
Unknown
Because it's essentially correct: there is a dynamic programming algorithm which works. (You just need to keep booleans for possible[nitems][sum], not merely one boolean for each sum.)
ShreevatsaR
@Unknown: I'm using my own set of numbers, not the one given in the question.
Georg
Once you have got a array of booleans which show what sums are possible for n elements, how will you extend it to n+1 elements? unless for each possible sum the elements it is composed of is also stored! to extend the solution, you have to know what elements have already been used for that sum.
Pranav
@Pranav: That's simple: Just add the n+1th element to all possible sums.
Georg
But that doesn't answer the op's question, that the required sum should me made of half of the elements in the set. Unless we keep track of all the possible sets of elements it can be composed of, for each boolean.
Pranav
@Pranav: I've changed my answer to a complete solution. It now uses a 2-dimensional array.
Georg
A: 

For performance you save computations by replacing append() and sum() with running totals.

Glenn
Sounds like premature optimization to me.
Georg
A: 

You could tighten your loop up a little by using the following:

def make_teams(que):
    que.sort()
    t1, t2 = []
    while que:
        t1.append(que.pop())
        if sum(t1) > sum(t2):
            t2, t1 = t1, t2
    print min(sum(t1),sum(t2)), max(sum(t1),sum(t2))
leif
+1  A: 

Note that it is also an heuristic and I moved the sort out of the function.

 def g(data):
   sums = [0, 0]
   for pair in zip(data[::2], data[1::2]):
     item1, item2 = sorted(pair)
     sums = sorted([sums[0] + item2, sums[1] + item1])
   print sums

data = sorted([2,3,10,5,8,9,7,3,5,2])
g(data)
odwl
+1  A: 

A test case where your method doesn't work is

que = [1, 1, 50, 50, 50, 1000]

The problem is that you're analyzing things in pairs, and in this example, you want all the 50's to be in the same group. This should be solved though if you remove the pair analysis aspect and just do one entry at a time.

Here's the code that does this

def make_teams(que):
    que.sort()
    que.reverse()
    if len(que)%2: que.insert(0,0)
    t1,t2 = [],[]
    while que:
        if abs(len(t1)-len(t2))>=len(que):
            [t1, t2][len(t1)>len(t2)].append(que.pop(0))
        else:
            [t1, t2][sum(t1)>sum(t2)].append(que.pop(0))
    print min(sum(t1),sum(t2)), max(sum(t1),sum(t2)), "\n"

if __name__=="__main__":
    que = [2,3,10,5,8,9,7,3,5,2]
    make_teams(que)
    que = [1, 1, 50, 50, 50, 1000]
    make_teams(que)

This give 27, 27 and 150, 1002 which are the answers that make sense to me.

Edit: In review, I find this to not actually work, though in the end, I'm not quite sure why. I'll post my test code here though, as it might be useful. The test just generates random sequence that have equal sums, puts these together and compares (with sad results).

Edit #2: Based in the example pointed out by Unknown, [87,100,28,67,68,41,67,1], it's clear why my method doesn't work. Specifically, to solve this example, the two largest numbers need to both be added to the same sequence to get a valid solution.

def make_sequence():
    """return the sums and the sequence that's devided to make this sum"""
    while 1:
        seq_len = randint(5, 200)
        seq_max = [5, 10, 100, 1000, 1000000][randint(0,4)]
        seqs = [[], []]
        for i in range(seq_len):
            for j in (0, 1):
                seqs[j].append(randint(1, seq_max))
        diff = sum(seqs[0])-sum(seqs[1])
        if abs(diff)>=seq_max: 
            continue
        if diff<0:
            seqs[0][-1] += -diff
        else:
            seqs[1][-1] += diff
        return sum(seqs[0]), sum(seqs[1]), seqs[0], seqs[1]

if __name__=="__main__":

    for i in range(10):
        s0, s1, seq0, seq1 = make_sequence()
        t0, t1 = make_teams(seq0+seq1)
        print s0, s1, t0, t1
        if s0 != t0 or s1 != t1:
            print "FAILURE", s0, s1, t0, t1
tom10
You gave a test case to prove this wrong. Upvoted.The reason i did in pairs is because there needed to be equal number of entries in both lists.
Lakshman Prasad
Yes but still, I think that any simple solution will be an heuristic and the best result here should be 1002 150.
odwl
@odwl: I agree with your point. When you do this by pairs you get 101, 1051, and item-by-item give 150, 1002.
tom10
@becomingGuru, I implemented a solution that works correctly, take a look at it.
Unknown
@tom10 actually your solution fails for [87,100,28,67,68,41,67,1]. It outputs 223 236. Nice try.
Unknown
@Unknown: What's are the correct sequences and their sums in this case?
tom10
@tom10 if you looked at my post you would know
Unknown
@tom10, your solution is wrong because its a naive greedy algorithm which has been proven to be suboptimal at times.
Unknown
A: 

Since the lists must me equal the problem isn't NP at all.

I split the sorted list with the pattern t1<-que(1st, last), t2<-que(2nd, last-1) ...

def make_teams2(que):
    que.sort()
    if len(que)%2: que.insert(0,0)
    t1 = []
    t2 = []
    while que:
        if len(que) > 2:
            t1.append(que.pop(0))
            t1.append(que.pop())
            t2.append(que.pop(0))
            t2.append(que.pop())
        else:
            t1.append(que.pop(0))
            t2.append(que.pop())
    print sum(t1), sum(t2), "\n"

Edit: I suppose that this is also a wrong method. Wrong results!

Nick D
Looks too complex to me
odwl
I could refactor it but it doesn't work anyway.The algorithm is simple my code is crappy :)
Nick D
The lists don't have to equal exactly. Also there can be a team of 4 and a team of 5. Take a look at my solution for one that works.
Unknown
+1  A: 

It's actually PARTITION, a special case of KNAPSACK.

It is NP Complete, with pseudo-polynomial dp algorithms. The pseudo in pseudo-polynomial refers to the fact that the run time depends on the range of the weights.

In general you will have to first decide if there is an exact solution before you can admit a heuristic solution.

klochner
A: 

After some thinking, for not too big problem, I think that the best kind of heuristics will be something like:

import random
def f(data, nb_iter=20):
  diff = None
  sums = (None, None)
  for _ in xrange(nb_iter):
    random.shuffle(data)
    mid = len(data)/2
    sum1 = sum(data[:mid])
    sum2 = sum(data[mid:])
    if diff is None or abs(sum1 - sum2) < diff:
      sums = (sum1, sum2)
  print sums

You can adjust nb_iter if the problem is bigger.

It solves all the problem mentioned above mostly all the times.

odwl
Take a look at my answer for a guaranteed deterministic solution
Unknown
+3  A: 

New Solution

This is a breadth-first search with heuristics culling. The tree is limited to a depth of players/2. The player sum limit is totalscores/2. With a player pool of 100, it took approximately 10 seconds to solve.

def team(t):
    iterations = range(2, len(t)/2+1)

    totalscore = sum(t)
    halftotalscore = totalscore/2.0

    oldmoves = {}

    for p in t:
        people_left = t[:]
        people_left.remove(p)
        oldmoves[p] = people_left

    if iterations == []:
        solution = min(map(lambda i: (abs(float(i)-halftotalscore), i), oldmoves.keys()))
        return (solution[1], sum(oldmoves[solution[1]]), oldmoves[solution[1]])

    for n in iterations:
        newmoves = {}
        for total, roster in oldmoves.iteritems():
            for p in roster:
                people_left = roster[:]
                people_left.remove(p)
                newtotal = total+p
                if newtotal > halftotalscore: continue
                newmoves[newtotal] = people_left
        oldmoves = newmoves

    solution = min(map(lambda i: (abs(float(i)-halftotalscore), i), oldmoves.keys()))
    return (solution[1], sum(oldmoves[solution[1]]), oldmoves[solution[1]])

print team([90,200,100])
print team([2,3,10,5,8,9,7,3,5,2])
print team([1,1,1,1,1,1,1,1,1,9])
print team([87,100,28,67,68,41,67,1])
print team([1, 1, 50, 50, 50, 1000])

#output
#(200, 190, [90, 100])
#(27, 27, [3, 9, 7, 3, 5])
#(5, 13, [1, 1, 1, 1, 9])
#(229, 230, [28, 67, 68, 67])
#(150, 1002, [1, 1, 1000])

Also note that I attempted to solve this using GS's description, but it is impossible to get enough information simply by storing the running totals. And if you stored both the number of items and totals, then it would be the same as this solution except you kept needless data. Because you only need to keep the n-1 and n iterations up to numplayers/2.

I had an old exhaustive one based on binomial coefficients (look in history). It solved the example problems of length 10 just fine, but then I saw that the competition had people of up to length 100.

Unknown
@becomingGuru, I implemented a solution that works correctly, take a look at it.
tom10
@tom10 actually your solution fails for [87,100,28,67,68,41,67,1]. It outputs 223 236. Nice try.
Unknown
Then this would have been a reasonable comment to make in my post, not simply a redirect to your answer.
tom10
@tom10, no it is not. When your friend makes a mistake, do you simply tell him he's wrong? Or do you tell him how to solve it?
Unknown
So with your combinations, is this really an implementation of trying all cases in the knapsack problem?
tom10
Uh, this takes exponential time; the number of combinations can be quite large. With 100 items (the sizes in the problem), there are 2^100 combinations; trying all of them will take... forever
ShreevatsaR
@ShreevastsaR, no you are wrong. You didn't even bother to try running the code.
Unknown
Of course I didn't bother running the code. :) Here, just add the following line and run it: "t = range(100); print maketeams(t)". It takes forever. (100 choose 50) is ≈10^29 [http://google.com/search?q=100+choose+50], and 2^100 is ≈10^30, so "trying all combinations of length people/2" is still way too infeasible. It takes ages even for smaller n, like say, n=30.
ShreevatsaR
@ShreevatsaR, and I don't know where you get the 100 from. The lists are around 10 people, and 10 choose 5 is around 252. You didn't even bother to read the question.
Unknown
From the question: "Each test case starts with a blank line, followed by N, the total number of players. [...] There shall be at most 100 players in all(1<=N<=100)." For some background information on how programming contests work: there are usually some (small) example inputs given, but your program you submit is evaluated on inputs of the size mentioned in the problem statements. (The example inputs are deliberately intended to be simple.) *Some* contests, such as the upcoming IPSC http://ipsc.ksp.sk/ , provide the actual inputs beforehand, but this is not how IOI, ACM-ICPC etc. work.
ShreevatsaR
Would it be easy/possible to modify this code to get rid of the "groups of equal size" requirement to get 2 groups of the closest possible totals? What would I change to do this? Thanks
Dan
+1  A: 

Q. Given a multiset S of integers, is there a way to partition S into two subsets S1 and S2 such that the sum of the numbers in S1 equals the sum of the numbers in S2?

A.Set Partition Problem.

Best of luck approximating. : )

kunjaan
A: 

In an earlier comment I hypothesized that the problem as set was tractable because they had carefully chosen the test data to be compatible with various algorithms within the time alloted. This turned out not to be the case - instead it is the problem constraints - numbers no higher than 450 and a final set no larger than 50 numbers is the key. These are compatible with solving the problem using the dynamic programming solution I put up in a later post. None of the other algorithms (heuristics, or exhaustive enumeration by a combinatorial pattern generator) can possibly work because there will be test cases large enough or hard enough to break those algorithms. It's rather annoying to be honest because those other solutions are more challenging and certainly more fun. Note that without a lot of extra work, the dynamic programming solution just says whether a solution is possible with N/2 for any given sum, but it doesn't tell you the contents of either partition.

Graham Toal
+1  A: 

They are obviously looking for a dynamic programming knapsack solution. So after my first effort (a pretty good original heuristic I thought), and my second effort (a really sneaky exact combinatorial solution that worked for shortish data sets, and even for sets up to 100 elements as long as the number of unique values was low), I finally succumbed to peer pressure and wrote the one they wanted (not too hard - handling duplicated entries was the trickiest part - the underlying algorithm I based it on only works if all the inputs are unique - I'm sure glad that long long is big enough to hold 50 bits!).

So for all the test data and awkward edge cases I put together while testing my first two efforts, it gives the same answer. At least for the ones I checked with the combinatorial solver, I know they're correct. But I'm still failing the submission with some wrong answer!

I'm not asking for anyone to fix my code here, but I would be very greatful if anyone can find a case for which the code below generates the wrong answer.

Thanks,

Graham

PS This code does always execute within the time limit but it is far from optimised. i'm keeping it simple until it passes the test, then I have some ideas to speed it up, maybe by a factor of 10 or more.

#include <stdio.h>

#define TRUE (0==0)
#define FALSE (0!=0)

static int debug = TRUE;

//int simple(const void *a, const void *b) {
//  return *(int *)a - *(int *)b;
//}

int main(int argc, char **argv) {
  int p[101];
  char *s, line[128];
  long long mask, c0[45001], c1[45001];
  int skill, players, target, i, j, tests, total = 0;

  debug = (argc == 2 && argv[1][0] == '-' && argv[1][1] == 'd' && argv[1][2] == '\0');

  s = fgets(line, 127, stdin);
  tests = atoi(s);
  while (tests --> 0) {

    for (i = 0; i < 45001; i++) {c0[i] = 0LL;}

    s = fgets(line, 127, stdin); /* blank line */
    s = fgets(line, 127, stdin); /* no of players */
    players = atoi(s);
    for (i = 0; i < players; i++) {s = fgets(line, 127, stdin); p[i] = atoi(s);}

    if (players == 1) {
      printf("0 %d\n", p[0]);
    } else {

    if (players&1) p[players++] = 0; // odd player fixed by adding a single player of 0 strength
    //qsort(p, players, sizeof(int), simple);

    total = 0; for ( i = 0; i < players; i++) total += p[i];
    target = total/2; // ok if total was odd and result rounded down - teams of n, n+1
    mask = 1LL << (((long long)players/2LL)-1LL);

    for (i = 0; i < players; i++) {
      for (j = 0; j <= target; j++) {c1[j] = 0LL;} // memset would be faster
      skill = p[i];
      //add this player to every other player and every partial subset
      for (j = 0; j <= target-skill; j++) {
        if (c0[j]) c1[j+skill] = c0[j]<<1;  // highest = highest j+skill for later optimising
      }
      c0[skill] |= 1; // so we don't add a skill number to itself unless it occurs more than once
      for (j = 0; j <= target; j++) {c0[j] |= c1[j];}
      if (c0[target]&mask) break; // early return for perfect fit!
    }

    for (i = target; i > 0; i--) {
      if (debug || (c0[i] & mask)) {
        fprintf(stdout, "%d %d\n", i, total-i);
        if (debug) {
          if (c0[i] & mask) printf("******** ["); else
          printf("         [");
          for (j = 0; j <= players; j++) if (c0[i] & (1LL<<(long long)j)) printf(" %d", j+1);
          printf(" ]\n");
        } else break;
      }
    }
    }
    if (tests) printf("\n");
  }
  return 0;
}
Graham Toal