views:

302

answers:

4

I have a list of numbers (example: [-1, 1, -4, 5]) and I have to remove numbers from the list without changing the total sum of the list. I want to remove the numbers with biggest absolute value possible, without changing the total, in the example removing [-1, -4, 5] will leave [1] so the sum doesn't change.

I wrote the naive approach, which is finding all possible combinations that don't change the total and see which one removes the biggest absolute value. But that is be really slow since the actual list will be a lot bigger than that.

Here's my combinations code:

from itertools import chain, combinations

def remove(items):
    all_comb = chain.from_iterable(combinations(items, n+1) 
                                   for n in xrange(len(items)))
    biggest = None
    biggest_sum = 0
    for comb in all_comb:
        if sum(comb) != 0:
            continue # this comb would change total, skip
        abs_sum = sum(abs(item) for item in comb)
        if abs_sum > biggest_sum:
            biggest = comb
            biggest_sum = abs_sum
    return biggest

print remove([-1, 1, -4, 5])

It corectly prints (-1, -4, 5). However I am looking for some clever, more efficient solution than looping over all possible item combinations.

Any ideas?

+10  A: 

if you redefine the problem as finding a subset whose sum equals the value of the complete set, you will realize that this is a NP-Hard problem, (subset sum)

so there is no polynomial complexity solution for this problem .

Alon
Thank you for your answer, and the good link. Wikipedia seems to imply that there is a *Pseudo-polynomial time dynamic programming solution*, which means I'd store part of the solution to help with future calculation, but by reading it I couldn't make sense (it is in english form and english is not my natural language). Can you help me understand it so I can write an algorithm using this metod and test it against mine? Seems like it will be faster.
nosklo
I think I got it!! Look at my answer.
nosklo
A: 

I do not program in Python so my apologies for not offering code. But I think I can help with the algorithm:

  1. Find the sum
  2. Add numbers with the lowest value until you get to the same sum
  3. Everything else can be deleted

I hope this helps

Square Rig Master
Thanks. Can you give me an example on how to do that? I mean, if I run it with `[6, 44, 1, -7, -6, 19]`, I would expect it to remove `(6, 1, -7)` leaving `[-6, 19, 44]`, would that happen?
nosklo
A: 

Your requirements don't say if the function is allowed to change the list order or not. Here's a possibility:

def remove(items):
    items.sort()
    running = original = sum(items)
    try:
        items.index(original) # we just want the exception
        return [original]
    except ValueError:
        pass
    if abs(items[0]) > items[-1]:
        running -= items.pop(0)
    else:
        running -= items.pop()
    while running != original:
        try:
            running -= items.pop(items.index(original - running))
        except ValueError:
            if running > original:
                running -= items.pop()
            elif running < original:
                running -= items.pop(0)
    return items

This sorts the list (big items will be at the end, smaller ones will be at the beginning) and calculates the sum, and removes an item from the list. It then continues removing items until the new total equals the original total. An alternative version that preserves order can be written as a wrapper:

from copy import copy

def remove_preserve_order(items):
    a = remove(copy(items))
    return [x for x in items if x in a]

Though you should probably rewrite this with collections.deque if you really want to preserve order. If you can guarantee uniqueness in your list, you can get a big win by using a set instead.

We could probably write a better version that traverses the list to find the two numbers closest to the running total each time and remove the closer of the two, but then we'd probably end up with O(N^2) performance. I believe this code's performance will be O(N*log(N)) as it just has to sort the list (I hope Python's list sorting isn't O(N^2)) and then get the sum.

Chris Lutz
Interesting code. Order doesn't matter to me. But I have duplicate items that count to the sum, so I don't think I can use sets. Your code works with my original numbers ([1] is returned) and is very fast. but when I tried it with `[6, 44, 1, -7, -6, 19]` (I would expect it to remove `(6, 1, -7)` returning `[-6, 19, 44]`, keeping the same sum `57`) it fails with `IndexError: pop from empty list` on the last `running -= items.pop(0)`. Do you know any way to solve this? Thanks for your help.
nosklo
It does that because my version tries one order and one order only. You could make a recursive version, but you'd have to split the function into two functions (the part that does setup work, and the part that loops and recurses). I can whip up something really quickly if you like, but you may lose some efficiency. But let's code and not guess at efficiency before we've started, shall we?
Chris Lutz
+3  A: 
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Copyright © 2009 Clóvis Fabrício Costa
# Licensed under GPL version 3.0 or higher

def posneg_calcsums(subset):
    sums = {}
    for group in chain.from_iterable(combinations(subset, n+1) 
                                     for n in xrange(len(subset))):
        sums[sum(group)] = group
    return sums

def posneg(items):
    positive = posneg_calcsums([item for item in items if item > 0])
    negative = posneg_calcsums([item for item in items if item < 0])
    for n in sorted(positive, reverse=True):
        if -n in negative:
            return positive[n] + negative[-n]
    else:
        return None

print posneg([-1, 1, -4, 5])
print posneg([6, 44, 1, -7, -6, 19])

It works fine, and is a lot faster than my first approach. Thanks to Alon for the wikipedia link and ivazquez|laptop on #python irc channel for a good hint that led me into the solution.

I think it can be further optimized - I want a way to stop calculating the expensive part once the solution was found. I will keep trying.

nosklo
very nice implementation! gland it you've got it worked out ;-)
Alon
@Alon: I think I can get further optimizations - any ideas?
nosklo
Is it correct that your solution assumes that `sum(items) == 0`?
J.F. Sebastian
@J.F. Sebastian : no. If `sum(items) == 0` that would mean that I can remove everything... So it assumes that `sum(items) != 0`
nosklo
@nosklo: I thought so. Why then `sum(positive[n]+negative[-n]) == 0` (where `sum(positive[n]) == n and sum(negative[-n]) == -n` by definition).
J.F. Sebastian
@nosklo: ok, I've realised that your function should be called `biggest_zero_sum_items()` or something.
J.F. Sebastian