tags:

views:

310

answers:

4

Programming challenge: Given a set of integers [1, 2, 3, 4, 5] I would like to generate all possible k-combinations in ascending size order in Java; e.g.

[1], [2], [3], [4], [5], [1, 2], [1, 3] ... [1, 2, 3, 4, 5]

It is fairly easy to produce a recursive solution that generates all combinations and then sort them afterwards but I imagine there's a more efficient way that removes the need for the additional sort.

A: 

Generating combinations takes MUCH longer than sorting, and it doesn't take that long to sort 100,000 numbers given n*log(n) sorting time. You're pre-optimizing. This is bad.

Stefan Kendall
100,000 was a ballpark rounded number. If numbers could repeat, you'd be bounded by 5**5 ~= 3125, which is a trivial amount of items.
Stefan Kendall
Agreed that premature optimization is a waste of time but this is not so much a real-world problem as a challenge to produce a more efficient algorithm.
Adamski
@Stefan Kendall: Disagree. Sorting n items may take n*log(n) time. My answer above generates them (in the wrong order) in exactly that time, rather than "MUCH longer". Keep in mind that generating combinations of n items might take 2^n time, but you have to decide whether n is the number of items in the original list, or the total number of lists in the result.
John
+4  A: 

Just do iterative deepening search.

void comb(int... items) {
    Arrays.sort(items);
    for (int k = 1; k <= items.length; k++) {
        kcomb(items, 0, k, new int[k]);
    }
}
public void kcomb(int[] items, int n, int k, int[] arr) {
    if (k == 0) {
        System.out.println(Arrays.toString(arr));
    } else {
        for (int i = n; i <= items.length - k; i++) {
            arr[arr.length - k] = items[i];
            kcomb(items, i + 1, k - 1, arr);
        }
    }
}

Then call, say, comb(10,20,30). It will print:

[10]
[20]
[30]
[10, 20]
[10, 30]
[20, 30]
[10, 20, 30]
polygenelubricants
Cool - I'd never heard of that search technique before.
Adamski
It's the bastard child of BFS and DFS.
polygenelubricants
This was a greating starting point and I realised I could improve on it by remembering previous states during the generation (see my answer).
Adamski
+2  A: 

There are two ways of interpreting the "ascending" requirement. The looser interpretation is "in every list, the integers should appear in ascending order". The stricter interpretation is "the lists need to be given in order". I guess that's the one you want, but I came up with a simple iterative way to satisfy the looser requirement.

For n items, count through all n-bit numbers. If the bit corresponding to an item is there, then it's in the result list.

public static void displaySubsets(List<Integer> sortedInts) {
    int n=sortedInts.size();
    long combinations = 1 << n;
    for (int setNumber=0; setNumber<combinations; setNumber++) {
      List<Integer> aResult = new ArrayList<Integer>();
      for (int digit=0; digit<n; digit++) {
        if ((setNumber & (1<<digit)) > 0) {
          aResult.add(sortedInts.get(digit));
        }
      }
      System.out.println(aResult.toString()+", ");
    }
  }

Result for 1,2,3,4,5 is: [], [1], [2], [1, 2], [3], [1, 3], [2, 3], [1, 2, 3], [4], [1, 4], [2, 4], [1, 2, 4], [3, 4], [1, 3, 4], [2, 3, 4], [1, 2, 3, 4], [5], [1, 5], [2, 5], [1, 2, 5], [3, 5], [1, 3, 5], [2, 3, 5], [1, 2, 3, 5], [4, 5], [1, 4, 5], [2, 4, 5], [1, 2, 4, 5], [3, 4, 5], [1, 3, 4, 5], [2, 3, 4, 5], [1, 2, 3, 4, 5]

Yes, I know, I lose points for not using recursion.

John
Nice try but I was actually looking for the output lists in ascending size order.
Adamski
Thought so. Thanks.
John
A: 

I was thinking about this some more and realised it can be efficiently done using a dynamic programming approach. Below is the iterative solution I've produced, which uses a Queue to hold state (although one could use a Stack instead).

I believe this is far more efficient than a recursive iterative deepening search as it will not involve revisiting existing states during the generation; it remembers the previous states using the queue and uses these to generate successive states.

Performance Comparison

Algorithm                    | 5 elems | 10 elems | 20 elems
--------------------------------------------------------------------------
Recursive (#recursions)      | 62      | 2046     | 2097150
Dynamic   (#loop iterations) | 32      | 1024     | 1048576

Code

public class Test {
    private static class Pair {
        private final List<Integer> result;
        private final int index;

        private Pair(List<Integer> result, int index) {
            this.result = result;
            this.index = index;
        }

        public List<Integer> getResult() {
            return result;
        }

        public int getIndex() {
            return index;
        }
    }

    public static void main(String[] args) {
        List<Integer> items = Arrays.asList(1, 2, 3, 4, 5);
        foo(items);
    }

    private static void foo(List<Integer> items) {
        Queue<Pair> queue = new LinkedList<Pair>();
        queue.add(new Pair(Collections.<Integer>emptyList(), 0));

        while (!queue.isEmpty()) {
            Pair pair = queue.poll();

            System.err.println(pair.getResult());

            if (pair.getResult().size() < items.size()) {
                for (int i=pair.getIndex(); i<items.size(); ++i) {
                    List<Integer> copy = new LinkedList<Integer>(pair.getResult());
                    copy.add(items.get(i));
                    queue.add(new Pair(copy, i + 1));
                }
            }
        }
    }
}
Adamski
This is BFS, therefore uses exponential space (i.e. very inefficient). DFS uses linear space (efficient), but doesn't print in ascending size order. That's why IDDFS is created: it has features of both, in that it uses space efficiently, but still visits in "level-order" (i.e. prints in ascending size order in your scenario). Also, read the wikipedia article about the complexity of IDDFS. It's only twice as long as BFS (which is what your numbers also suggest), for a much more efficient use of space. It's the classic time/space tradeoff between BFS and DFS, with IDDFS in the middle.
polygenelubricants