views:

270

answers:

1

Hi all,

1st of all: I'm not a programmer, never learnt programming/algorithms. Actually I have to program, mostly in awk, or ruby, some bash.

In today's task, I have a huge data set (float numbers) in a plain text file, one record/line, and a sum of all numbers of the set, but the sum is wrong, because some of the numbers (can be only one) in the set is negative, but we can't see it in the file (there's no sign if an element is negative).

But I have to find it/them: so first I had calculated the correct total sum (with adding all the numbers with awk) didn't care about their signs. Now I now the difference between the original sum (which cared about signs) and my new total sum. But I have to find all the subsets of the dataset, which has the exact same sum like the difference/2.

E.g.:

DATA:
1,2,3,4,5

ORIG SUM: 
5

Now we can calculate the difference between 1+2+3+4+5 - ORIG SUM: 15-5=10. 10/2 = 5, so I need to find all the subsets which can add up to 5, that is [1,4],[2,3],[5].

Is there a proper way to do it? I prefer awk, ruby, shell scripting, but both python and perl is acceptable (without heavy uses of external libraries, as I've got no right to install them).

Thanks in advance.

+2  A: 

You mean the SUBSET SUM problem as known in computer science?

Hint: Look in the related questions, there are MANY questions/answers about that problem.

Johannes Weiß
Looks like I need that.
Zsolt Botykai