tags:

views:

180

answers:

5

Hello,

I have a list of numbers which total 540000. I would like to sort this list into 3 lists which each total 180000. What is the most efficient programming method to do this, assuming that the list of numbers is a flat file with a number per line?

A: 
for i as integer = 1 to 180000
put data in array 1
next i

for i as integer = 180001 to 360000
put data in array 2
next i

for i as integer = 360001 to 540000
put data in array 3
next i
zeocrash
LOL Ask a generic question, get a generic answer
iWerner
what can i say, it was a very generic question. I figured i wasn't going to get much more specific in a post tagged "programming"
zeocrash
The list of numbers would be something like this: 20,000 63715 1571 8429 etc etc. If you add them all up you get 540,000 but I need to add in such a way that I get three totals of 180,000
Andy
I apologise i misread your question.
zeocrash
+1  A: 

Sounds like a variation of the Knapsack problem . It would be useful to know the size of these numbers, and count - are there huge variations in size, or are they all similar in scale - are there lots of them (>1000) or just a few (<100)?

One quick and dirty method would be to sort them into size order - largest to smallest - then loop over them, putting the first in the first list, the second into the second list, the third into the third list, and then go back and put the fourth into the first list... and so on. May work quite well for lots of small-ish numbers... but there are other approaches for different types fo dataset.

Dycey
i'm not sure each list needs to be of equal "value" isn't the guy just looking for a simple 3 way split
zeocrash
no - I think the key word is 'total' - he wants to group the numbers so that they total 18000 each. At least, that's how I read it.
Dycey
What I tried so far was to generate three lists, each which came to nearly 180,000 and then look through the remaining numbers to see if I could then put them into one of the lists and get to 180,000 but this didn't work out.
Andy
The numbers vary greatly in value, and there could be around 100 of them.
Andy
:-) Accountants do this all the time. There are tricks to it - you as zeocrash/ian-witz say, it may not even be possible! But for 100 numbers, try starting with the pick alternates method I'd outlined above, and then see how close you get. You can then look for small numbers in the list that are double the difference between lists, and use some sort of relaxation approach (i.e. swap smaller numbers each time between two lists) until you get close. But it may not be possible. Depending on what you're doing, sometimes faking it is easier - i.e. get close then add padding :-)
Dycey
Ok thanks I'll try this. At the end of the day I suspect printing the list out and just looking at it may identify combinations that would work. I.e. the amazing human brain at it's mysterious best!
Andy
Well, since my algorithm is already implemented and tested, I'm happy enough to run it for you. If it returns a result by tomorrow, I can ship that back to you. Just post your list here!
Carl Smotricz
A: 

this has the smell of np-hardness to me - in which case there is no 'efficient' way to do it - although you could probably come up with any number of heuristics that could tackle it pretty well.

having said that you'll still have problems with lists like [179998, 180001, 180001] :)

ian-witz
yes indeed it certainly does...
Dycey
And depending the numbers in the list there may not be perfect solution. At least all the numbers needs to be less or equal than 180000
Vertigo
A: 

I've written some Java code to do most of the work for you.

The smaller of the methods takes a list of numbers and a total to be achieved, and it returns a set of lists of numbers that add up to that total. You could run it with 18000 and your list of numbers.

For each list of numbers returned, you need to make a new list that is missing the numbers already used, and run the method on 18000 and that again.

If this second invocation returns one or more lists, you'll know the problem is soluble because the numbers remaining will also add up to 18000.

Anyway, here's the code. Yes, it's just recursive brute force. It's very likely there's no proven method to consistently do better by any other method. Don't blame me if it runs for a long time; you may want to try it with smaller examples first.

import java.util.*; 

public class Listen {

   private static Set<List<Integer>> makeFrom(int total, List<Integer> numbers) {
      Set<List<Integer>> results = new HashSet<List<Integer>>();
      List<Integer> soFar = new ArrayList<Integer>();
      makeFrom(results, total, soFar, numbers, 0);
      return results;
   }

   private static void makeFrom(Set<List<Integer>> results, int total, List<Integer> soFar, List<Integer> numbers, int startingAt) {
      if (startingAt >= numbers.size()) return;
      for (int p=startingAt; p<numbers.size(); p++) {
         Integer number = numbers.get(p);
         List<Integer> newSoFar = new ArrayList<Integer>(soFar);
         newSoFar.add(number);
         int newTotal = total - number;
         if (newTotal < 0) continue;
         if (newTotal == 0) {
            Collections.sort(newSoFar);
            results.add(newSoFar);
         } else {
            List<Integer> newNumbers = new ArrayList<Integer>(numbers);
            newNumbers.remove(number);
            makeFrom(results, newTotal, newSoFar, newNumbers, startingAt + 1);
         }
      }
   }

   public static void main(String[] args) {
      List<Integer> numbers = new ArrayList<Integer>();
      for (int j=1; j<11; j++) numbers.add(j);
      for (List<Integer> result : makeFrom(25, numbers)) {
         System.out.println(Arrays.deepToString(result.toArray(new Integer[result.size()])));
      }
   }
}
Carl Smotricz
that looks fantastic! i hope this is not asking too much but would it be a trouble for you to write this in psuedocode? I am hoping to rewrite this in Progress 4GL.
Andy
I will if you insist; but the amount of code there isn't large, and you should be able to puzzle it out on your own. I work with 2 kinds of data structures: A List of Integers, and a Set of such lists. You can treat List and ArrayList as synonymous, likewise Set and HashSet. "new" creates a new one, new with an argument copies the argument into the new; "add" adds something, "remove" removes the first (or only) occurrence of something. Collections.sort() sorts the list so it's in a "standard" order for set comparison.
Carl Smotricz
OK, an explanation of the algorithm is in my second answer. Hope you can make sense of that. And remember that there's probably no really "efficient" solution available.
Carl Smotricz
A: 

As ian-witz already remarked, this is probably a problem of the NP-complete sort: This means there's no really good solution for the general case, short of trying all possibilities. Algorithms that do this tend to become spectacularly slow as the amount of data they deal with increases.

That said, here's my algorithm for forming sub-lists having a specified sum from a given list of integers:

Set up a place to hold your results. The results will all be lists of numbers, each some sub-set of your original list. We don't know how many such lists will result; possibly none.

Put your list of numbers into an array so you can refer to them and access them by index. In the following, I'm assuming array indices starting at 1. Say you have 10 numbers, so you put them into a 10 element array, indexed by positions 1 through 10.

For performance reasons, it may help to sort your array in descending order. It's not necessary though.

Run a first index, call it i, through this array, i.e. through index values 1 through 10. 
For each index value:
  select the number at index position i, call it n1.
  set up a new list of numbers, where we will be assembling a sub-list. call it sublist.
  add n1 to the (so far empty) sublist.
  If i is already at 10, there's nothing more we can do. Otherwise,
  Run a second index, call it j, through the arrray, starting at i+1 and going up to 10.
  For each value of j:
    select the number at index position j, call it n2.
    add n2 to the sublist containing n1
    calculate the sum of our sublist so far: Does it add up to 18000? 
    If the exact total is reached, add the current sublist to our result list.
    If the total is exceeded, there's nothing we can add to make it better, so skip to the next value of j.
    If the total is less than 18000, you need to pick a third number.
    Run a third index, call it k, through the array, starting at j+1 and going up to 10. Skip this if j is already at 10 and there's no place to go.
    For each value of k:
      select the number at k, call it n3
      add n3 to the sublist
      check the sublist total against the expected total
      if the exact total is reached, store the sublist as a result; 
      if it's less than the expected, start a 4th loop, and so on.

      When you're done with checking a value for a loop, e.g. n4, you need to take your latest n4, n3 or whatever back out of the sublist because you'll be trying a different number next.

    Whenever you find a combination of numbers with the correct sum, store it in your results set.

When you've run all your loop counters into the wall (i.e. i is 10 and there's nowhere left to go), your "results" set will contain all sub-lists of the original list that added up to the desired total. It's possible there will be none, in that case there's no (exact) solution to your problem.

If you have 3 or more sub-lists in your results set, you can check if you can find a pair of them that use non-overlapping sets of numbers from the original list. If you have 2, then there should also be a 3rd sub-list containing exactly all the numbers not contained in the first 2 lists, and you have your solution.

My sample code doesn't do a series of loops; instead, it does one loop going from 1 to (say) 10 and looking for 18000. Then, let's say the first number chosen was 2000, the function calls itself again recursively with a hint to start at 2 (= i + 1) and to try to assemble a total of 16000. That call of the function then calls itself again with a starting position of (whatever + 1) and a total of (16000 - whatever), and it keeps calling itself that way with subsets of the original problem until there's no more room for the indexes to go up. If it finds a "good" sub-list on the way, it stores it in the result set.

How to implement this efficiently depends on the language you're doing it in. FORTRAN 77 doesn't have recursion, Lua doesn't implement lists or sets efficiently, Lisp may have trouble efficiently indexing into a list. In Java, I might use a bitset rather than a sublist. I know nothing about P4GL, so: For implementation, you're on your own!

Carl Smotricz