views:

348

answers:

4

I am trying to figure out an efficient algorithm to take a list of items and generate all unique subsets that result from splitting the list into exactly 2 sublists. I'm sure there is a general purpose way to do this, but I'm interested in a specific case. My list will be sorted, and there can be duplicate items.

Some examples:

Input
{1,2,3}

Output
{{1},{2,3}}
{{2},{1,3}}
{{3},{1,2}}

Input
{1,2,3,4}

Output
{{1},{2,3,4}}
{{2},{1,3,4}}
{{3},{1,2,4}}
{{4},{1,2,3}}
{{1,2},{3,4}}
{{1,3},{2,4}}
{{1,4},{2,3}}

Input
{1,2,2,3}

Output
{{1},{2,2,3}}
{{2},{1,2,3}}
{{3},{1,2,2}}
{{1,2},{2,3}}
{{1,3},{2,2}}

I can do this on paper, but I'm struggling to figure out a simple way to do it programmatically. I'm only looking for a quick pseudocode description of how to do this, not any specific code examples.

Any help is appreciated. Thanks.

+2  A: 
John Kugelman
Interesting idea. Basically reducing the complement operation to an XOR. I think this approach would still get messy sorting out unique sets when duplicates existed in the list. Maybe I'm just misinterpreting your answer.
Jason Z
the approach is very inefficient when we have the input list of equal numbers. e.g. {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1}. It will generate 2 ^ (number of '1'). Of course duplicates will be filtered in the line <code>subsets.add((l, r))</code> (sorry if I missed something, I am not familiar with python). But in any case time complexity will be much higher than possible.
sergdev
+1  A: 

A bit of Erlang code, the problem is that it generates duplicates when you have duplicate elements, so the result list still needs to be filtered...

do([E,F]) -> [{[E], [F]}];
do([H|T]) -> lists:flatten([{[H], T}] ++
                           [[{[H|L1],L2},{L1, [H|L2]}]  || {L1,L2} <- all(T)]).

filtered(L) ->
  lists:usort([case length(L1) < length(L2) of true -> {L1,L2};
                                               false -> {L2,L1} end
              || {L1,L2} <- do(L)]).

in pseudocode this means that:

  • for a two long list {E,F} the result is {{E},{F}}
  • for longer lists take the first element H and the rest of the list T and return
    • {{H},{T}} (the first element as a single element list, and the remaining list)
    • also run the algorithm recursively for T, and for each {L1,L2} element in the resulting list return {{H,L1},{L2}} and {{L1},{H,L2}}
Zed
+1  A: 

The following C++ function does exactly what you need, but the order differs from the one in examples:

// input contains all input number with duplicates allowed
void generate(std::vector<int> input) {
  typedef std::map<int,int> Map;
  std::map<int,int> mp;
  for (size_t i = 0; i < input.size(); ++i) {
    mp[input[i]]++;
  }

  std::vector<int> numbers;
  std::vector<int> mult;
  for (Map::iterator it = mp.begin(); it != mp.end(); ++it) {
    numbers.push_back(it->first);
    mult.push_back(it->second);
  }

  std::vector<int> cur(mult.size());
  for (;;) {
    size_t i = 0;
    while (i < cur.size() && cur[i] == mult[i]) cur[i++] = 0;
    if (i == cur.size()) break;
    cur[i]++;
    std::vector<int> list1, list2;
    for (size_t i = 0; i < cur.size(); ++i) {
      list1.insert(list1.end(), cur[i], numbers[i]);
      list2.insert(list2.end(), mult[i] - cur[i], numbers[i]);
    }
    if (list1.size() == 0 || list2.size() == 0) continue;
    if (list1 > list2) continue;
    std::cout << "{{";
    for (size_t i = 0; i < list1.size(); ++i) {
      if (i > 0) std::cout << ",";
      std::cout << list1[i];
    }
    std::cout << "},{";
    for (size_t i = 0; i < list2.size(); ++i) {
      if (i > 0) std::cout << ",";
      std::cout << list2[i];
    }
    std::cout << "}\n";
  }
}
sergdev
Good times converting that to C# :) It works as advertised. Thank you.
Jason Z
You are welcome :)
sergdev
A: 

My suggestion is...

First, count how many of each value you have, possibly in a hashtable. Then calculate the total number of combinations to consider - the product of the counts.

Iterate through that number of combinations.

At each combination, copy your loop count (as x), then start an inner loop through your hashtable items.

For each hashtable item, use (x modulo count) as your number of instances of the hashtable key in the first list. Divide x by the count before repeating the inner loop.

If you are worried that the number of combinations might overflow your integer type, the issue is avoidable. Use an array with each item (one for every hashmap key) starting from zero, and 'count' through the combinations treating each array item as a digit (so the whole array represents the combination number), but with each 'digit' having a different base (the corresponding count). That is, to 'increment' the array, first increment item 0. If it overflows (becomes equal to its count), set it to zero and increment the next array item. Repeat the overflow checks until If overflows continue past the end of the array, you have finished.

I think sergdev is using a very similar approach to this second one, but using std::map rather than a hashtable (std::unordered_map should work). A hashtable should be faster for large numbers of items, but won't give you the values in any particular order. The ordering for each loop through the keys in a hashtable should be consistent, though, unless you add/remove keys.

Steve314