views:

883

answers:

5

Hi,

I'm looking for a way to generate combinations of objects ordered by a single attribute. I don't think lexicographical order is what I'm looking for... I'll try to give an example. Let's say I have a list of objects A,B,C,D with the attribute values I want to order by being 3,3,2,1. This gives A3, B3, C2, D1 objects. Now I want to generate combinations of 2 objects, but they need to be ordered in a descending way:

  • A3 B3
  • A3 C2
  • B3 C2
  • A3 D1
  • B3 D1
  • C2 D1

Generating all combinations and sorting them is not acceptable because the real world scenario involves large sets and millions of combinations. (set of 40, order of 8), and I need only combinations above the certain threshold.

Actually I need count of combinations above a threshold grouped by a sum of a given attribute, but I think it is far more difficult to do - so I'd settle for developing all combinations above a threshold and counting them. If that's possible at all.

EDIT - My original question wasn't very precise... I don't actually need these combinations ordered, just thought it would help to isolate combinations above a threshold. To be more precise, in the above example, giving a threshold of 5, I'm looking for an information that the given set produces 1 combination with a sum of 6 ( A3 B3 ) and 2 with a sum of 5 ( A3 C2, B3 C2). I don't actually need the combinations themselves.

I was looking into subset-sum problem, but if I understood correctly given dynamic solution it will only give you information is there a given sum or no, not count of the sums.

Thanks

A: 

Check out this question in stackoverflow: Algorithm to return all combinations

I also just used a the java code below to generate all permutations, but it could easily be used to generate unique combination's given an index.

public static <E> E[] permutation(E[] s, int num) {//s is the input elements array and num is the number which represents the permutation

 int factorial = 1;

 for(int i = 2; i < s.length; i++)
  factorial *= i;//calculates the factorial of (s.length - 1)

 if (num/s.length >= factorial)// Optional. if the number is not in the range of [0, s.length! - 1] 
  return null;

 for(int i = 0; i < s.length - 1; i++){//go over the array

  int tempi = (num / factorial) % (s.length - i);//calculates the next cell from the cells left (the cells in the range [i, s.length - 1])
  E temp = s[i + tempi];//Temporarily saves the value of the cell needed to add to the permutation this time 

  for(int j = i + tempi; j > i; j--)//shift all elements to "cover" the "missing" cell
   s[j] = s[j-1];

  s[i] = temp;//put the chosen cell in the correct spot

  factorial /= (s.length - (i + 1));//updates the factorial

 }

 return s;
}
simon
Hi, Thanks for the comment. Unfortunately it is not acceptable to find ALL combinations since it would take a lot of time and this is time critical operation...
Željko Tanović
Sorry, I did not communicate this code very well. This code will generate one unique combination per num (index) the num has to an integer value within the set of all combination's. so if you call this function with x unique num's you will get x unique permutations.
simon
Hi,Yes - that's clear :) However since I need to isolate all combinations above a threshold ( my original question wasn't very precise - see edit ) I would have to call the function multiple times to get all combinations and then isolate those I need. I was looking for a way to avoid that.
Željko Tanović
A: 

I am extremely sorry (after all those clarifications in the comments) to say that I could not find an efficient solution to this problem. I tried for the past hour with no results.

The reason (I think) is that this problem is very similar to problems like the traveling salesman problem. Until unless you try all the combinations, there is no way to know which attributes will add upto the threshold.

There seems to be no clever trick that can solve this class of problems.

Still there are many optimizations that you can do to the actual code.

Try sorting the data according to the attributes. You may be able to avoid processing some values from the list when you find that a higher value cannot satisfy the threshold (so all lower values can be eliminated).

Niyaz
Thanks for trying :) I'm actually doing something similar to your last suggestion but hoped I'd find a better solutions...
Željko Tanović
I'm new here - don't have enough points to vote anything up...
Željko Tanović
Ha ha. Voting is nothing compared to the value we generate collectively.
Niyaz
+2  A: 

Actually, I think you do want lexicographic order, but descending rather than ascending. In addition:

  • It's not clear to me from your description that A, B, ... D play any role in your answer (except possibly as the container for the values).
  • I think your question example is simply "For each integer at least 5, up to the maximum possible total of two values, how many distinct pairs from the set {3, 3, 2, 1} have sums of that integer?"
  • The interesting part is the early bailout, once no possible solution can be reached (remaining achievable sums are too small).

I'll post sample code later.

Here's the sample code I promised, with a few remarks following:

public class Combos {

    /* permanent state for instance */
    private int values[];
    private int length;

    /* transient state during single "count" computation */
    private int n;
    private int limit;
    private Tally<Integer> tally;
    private int best[][];  // used for early-bail-out

    private void initializeForCount(int n, int limit) {
        this.n = n;
        this.limit = limit;
        best = new int[n+1][length+1];
        for (int i = 1; i <= n; ++i) {
            for (int j = 0; j <= length - i; ++j) {
                best[i][j] = values[j] + best[i-1][j+1];
            }
        }
    }

    private void countAt(int left, int start, int sum) {
        if (left == 0) {
            tally.inc(sum);
        } else {
            for (
                int i = start;
                i <= length - left
                && limit <= sum + best[left][i];  // bail-out-check
                ++i
            ) {
                countAt(left - 1, i + 1, sum + values[i]);
            }
        }
    }

    public Tally<Integer> count(int n, int limit) {
        tally = new Tally<Integer>();
        if (n <= length) {
            initializeForCount(n, limit);
            countAt(n, 0, 0);
        }
        return tally;
    }

    public Combos(int[] values) {
        this.values = values;
        this.length = values.length;
    }

}

Preface remarks:

This uses a little helper class called Tally, that just isolates the tabulation (including initialization for never-before-seen keys). I'll put it at the end.

To keep this concise, I've taken some shortcuts that aren't good practice for "real" code:

  • This doesn't check for a null value array, etc.
  • I assume that the value array is already sorted into descending order, required for the early-bail-out technique. (Good production code would include the sorting.)
  • I put transient data into instance variables instead of passing them as arguments among the private methods that support count. That makes this class non-thread-safe.

Explanation:

An instance of Combos is created with the (descending ordered) array of integers to combine. The value array is set up once per instance, but multiple calls to count can be made with varying population sizes and limits.

The count method triggers a (mostly) standard recursive traversal of unique combinations of n integers from values. The limit argument gives the lower bound on sums of interest.

The countAt method examines combinations of integers from values. The left argument is how many integers remain to make up n integers in a sum, start is the position in values from which to search, and sum is the partial sum.

The early-bail-out mechanism is based on computing best, a two-dimensional array that specifies the "best" sum reachable from a given state. The value in best[n][p] is the largest sum of n values beginning in position p of the original values.

The recursion of countAt bottoms out when the correct population has been accumulated; this adds the current sum (of n values) to the tally. If countAt has not bottomed out, it sweeps the values from the start-ing position to increase the current partial sum, as long as:

  • enough positions remain in values to achieve the specified population, and
  • the best (largest) subtotal remaining is big enough to make the limit.

A sample run with your question's data:

    int[] values = {3, 3, 2, 1};
    Combos mine = new Combos(values);
    Tally<Integer> tally = mine.count(2, 5);
    for (int i = 5; i < 9; ++i) {
        int n = tally.get(i);
        if (0 < n) {
            System.out.println("found " + tally.get(i) + " sums of " + i);
        }
    }

produces the results you specified:

found 2 sums of 5
found 1 sums of 6

Here's the Tally code:

public static class Tally<T> {
    private Map<T,Integer> tally = new HashMap<T,Integer>();
    public Tally() {/* nothing */}
    public void inc(T key) {
        Integer value = tally.get(key);
        if (value == null) {
            value = Integer.valueOf(0);
        }
        tally.put(key, (value + 1));
    }
    public int get(T key) {
        Integer result = tally.get(key);
        return result == null ? 0 : result;
    }
    public Collection<T> keys() {
        return tally.keySet();
    }
}
joel.neely
Hi,A,B...D serve only to clarify that I can have more than one object with the same value (another thing why I think that dynamic solution to subset sum won't work in this case).
Željko Tanović
Also - you are almost correct in redefining my problem :) With one small correction - For each integer in a set (5,6), how many distinct pairs from the set {3, 3, 2, 1} have sums of each integer.
Željko Tanović
Wow, thank you ! For both the code and the comprehensive explanation ! It took a significant amount of effort to write this :-) I'll try it out now :-) As I said I'm new here and have only 11 points (15 required for voting) but I believe your answer is probably the best one can come up with.
Željko Tanović
A: 

Here's a recursive approach to count the number of these subsets: We define a function count(minIndex,numElements,minSum) that returns the number of subsets of size numElements whose sum is at least minSum, containing elements with indices minIndex or greater.

As in the problem statement, we sort our elements in descending order, e.g. [3,3,2,1], and call the first index zero, and the total number of elements N. We assume all elements are nonnegative. To find all 2-subsets whose sum is at least 5, we call count(0,2,5).

Sample Code (Java):

int count(int minIndex, int numElements, int minSum)
{
    int total = 0;

    if (numElements == 1)
    {
        // just count number of elements >= minSum
        for (int i = minIndex; i <= N-1; i++)
            if (a[i] >= minSum) total++; else break;
    }
    else
    {
        if (minSum <= 0)
        {
            // any subset will do (n-choose-k of them)
            if (numElements <= (N-minIndex))
                total = nchoosek(N-minIndex, numElements);
        }
        else
        {
            // add element a[i] to the set, and then consider the count
            // for all elements to its right
            for (int i = minIndex; i <= (N-numElements); i++)
                total += count(i+1, numElements-1, minSum-a[i]);
        }
    }

    return total;
}

Btw, I've run the above with an array of 40 elements, and size-8 subsets and consistently got back results in less than a second.

Zach Scrivena
Hi Zach,Yes - but I want to find these subsets without enumerating them all
Željko Tanović
Using recursion for a combinatorial algorithm can be harmful !!!! :)
Niyaz
@zljk: That's right. This approach only counts them without enumerating the actual subsets. This allows us to "shortcircuit" many of the recursive steps.
Zach Scrivena
@Niyaz: I agree in general, but for this problem size, recursion seems to do fine.
Zach Scrivena
A: 

If you're using C# there is a fairly good generics library here. Note though that the generation of some permutations is not in lexicographic order

Jeffrey Cameron