views:

946

answers:

7

Let's say I've a string "12345" I should obtain all subsequence combinations of this string such as:

  1. --> 1 2 3 4 5
  2. --> 12 13 14 15 23 24 25 34 35 45
  3. --> 123 124 125 234 235 345
  4. --> 1234 1235 1245 1345 2345
  5. --> 12345

Please note that I grouped them in different number of chars but not changed their order. I need a method/function does that.

A: 

The simplest algorithm for generating subsets of a set of size N is to consider all binary numbers using N bits. Each position in the number represents an element from the set. If a bit in the number is 1, the corresponding set element is in the subset, otherwise the element isn't in the subset. Since the bits in a number are ordered, this preserves the ordering of the original set.

References:

  1. "Efficiently Enumerating the Subsets of a Set"; Loughry, Hemert and Schoofs
  2. "Generating Subsets"; Stony Brook Algorithm Repository
outis
And it's worth pointing out that this same technique can easily be adapted to sequences, rather than sets. Each of the N bits would correspond to one of the N sequence members which you would either take or leave according to whether the bit is 0 or 1.
PeterAllenWebb
@whoever is downvoting: why? (http://meta.stackoverflow.com/questions/135/encouraging-people-to-explain-down-votes). A sequence is basically a set of pairs, the first of which is an integer. As noted in the answer, the method preserves ordering, so if you start with a sequence, you get sequences.
outis
+4  A: 

In C++ given the following routine:

template <typename Iterator>
bool next_combination(const Iterator first, Iterator k, const Iterator last)
{
   /* Credits: Mark Nelson http://marknelson.us */
   if ((first == last) || (first == k) || (last == k))
      return false;
   Iterator i1 = first;
   Iterator i2 = last;
   ++i1;
   if (last == i1)
      return false;
   i1 = last;
   --i1;
   i1 = k;
   --i2;
   while (first != i1)
   {
      if (*--i1 < *i2)
      {
         Iterator j = k;
         while (!(*i1 < *j)) ++j;
         std::iter_swap(i1,j);
         ++i1;
         ++j;
         i2 = k;
         std::rotate(i1,j,last);
         while (last != j)
         {
            ++j;
            ++i2;
         }
         std::rotate(k,i2,last);
         return true;
      }
   }
   std::rotate(first,k,last);
   return false;
}

You can then proceed to do the following:

std::string s = "12345";
for(std::size_t i = 1; i <= s.size(); ++i)
{
   do
   {
      std::cout << std::string(s.begin(),s.begin() + i) << std::endl;
   }
   while(next_combination(s.begin(),s.begin() + i,s.end()));
}
Beh Tou Cheh
Wow that sounds cool but terrifying. Cannot be any recursive or iterated answers? I don't know much about STL or C Iterators
Ahmet Alp Balkan
A: 

using python, the itertools module defines a combinations() method which does just what you need.

from itertools import *
list(combinations( '12345', 2 ))

will give you:

[('1', '2'), ('1', '3'), ('1', '4'), ('1', '5'), ('2', '3'), ('2', '4'), ('2', '5'), ('3', '4'), ('3', '5'), ('4', '5')]
Adrien Plisson
Pretty great but I can't port it to Java since I don't know much about Python. There is no "combinations" function in Java.
Ahmet Alp Balkan
+1 compensation vote, OP never explicitly wrote in the question why it shouldn't be in Python so it seems unfair to downvote a somewhat correct answer
Spoike
A: 

Adrien Plisson's answer shows how one retrieves all subsequences of a specified length in Python (for arbitrary sequence data types). The OP specifies that he works with strings, and that he wants all subsequences. Thus, using itertools.combinations we define:

>>> from itertools import combinations
>>> def subseq_combos(inp):
...     return (''.join(s) for r in range(len(inp) + 1) for s in combinations(inp, r))
... 
>>> list(subseq_combos('12345'))
['', '1', '2', '3', '4', '5', '12', '13', '14', '15', '23', '24', '25', '34', '35', '45', '123', '124', '125', '134', '135', '145', '234', '235', '245', '345', '1234', '1235', '1245', '1345', '2345', '12345']

(If the empty subsequence should be omitted, then use range(1, len(inp) + 1)).)

Stephan202
Pretty great but I can't port it to Java since I don't know much about Python.
Ahmet Alp Balkan
A: 

You want a powerset. Here are all the questions on StackOverflow that mention powersets or power sets.

Here is a basic implementation in python:

def powerset(s):
    n = len(s)
    masks = [1<<j for j in xrange(n)]
    for i in xrange(2**n):
        yield [s[j] for j in range(n) if (masks[j] & i)]


if __name__ == '__main__':
    for elem in powerset([1,2,3,4,5]):
        print elem

And here is its output:

[]
[1]
[2]
[1, 2]
[3]
[1, 3]
[2, 3]
[1, 2, 3]
[4]
[1, 4]
[2, 4]
[1, 2, 4]
[3, 4]
[1, 3, 4]
[2, 3, 4]
[1, 2, 3, 4]
[5]
[1, 5]
[2, 5]
[1, 2, 5]
[3, 5]
[1, 3, 5]
[2, 3, 5]
[1, 2, 3, 5]
[4, 5]
[1, 4, 5]
[2, 4, 5]
[1, 2, 4, 5]
[3, 4, 5]
[1, 3, 4, 5]
[2, 3, 4, 5]
[1, 2, 3, 4, 5]

Notice that its first result is the empty set. Change the iteration from this for i in xrange(2**n): to this for i in xrange(1, 2**n): if you want to skip an empty set.

Here is the code adapted to produce string output:

def powerset(s):
    n = len(s)
    masks = [1<<j for j in xrange(n)]
    for i in xrange(2**n):
        yield "".join([str(s[j]) for j in range(n) if (masks[j] & i)])


Edit 2009-10-24

Okay, I see you are partial to an implementation in Java. I don't know Java, so I'll meet you halfway and give you code in C#:

    static public IEnumerable<IList<T>> powerset<T>(IList<T> s)
    {
        int n = s.Count;
        int[] masks = new int[n];
        for (int i = 0; i < n; i++)
            masks[i] = (1 << i);
        for (int i = 0; i < (1 << n); i++)
        {
            List<T> newList = new List<T>(n);
            for (int j = 0; j < n; j++)
                if ((masks[j] & i) != 0)
                    newList.Add(s[j]);
            yield return newList;
        }
    }
hughdbrown
Someone downvoted this -- why? Because I returned to this and added code in C# (and not Java!) to do what the OP asked? Sheesh. Glad I went the extra mile.
hughdbrown
Order could be important in the poster's sequence, and it might be that elements occur more than once. Thus, the power set may not really help that much.
PeterAllenWebb
Solved! Thanks.
Ahmet Alp Balkan
@hughdbrown: The same thing happened with my answer. I can not realize the reason of downvote.
sergdev
@PeterAllenWebb: Interesting speculation, but not actually requirements of the OP. Not sure where you got the idea that having elements occur more than once was something the OP wanted.
hughdbrown
A: 

You can use the following class for this (in Java):

class Combinations {

  String input;
  StringBuilder cur;

  private void next(int pos, int reminder) {
    cur.append(input.charAt(pos));

    if (reminder == 1) {
      System.out.println(cur);
    } else {
      for (int i = pos + 1; i + reminder - 1 <= input.length(); i++)
        next(i, reminder - 1);
    }
    cur.deleteCharAt(cur.length() - 1);
  }

  public void generate(String input) {
    cur = new StringBuilder();
    this.input = input;
    for (int length = 1; length <= input.length(); length++)
      for (int pos = 0; pos + length <= input.length(); pos++)
        next(pos, length);
  }
}

To run your example use the following code:

new Combinations().generate("12345");

The order of the output is the same as in example. It does not require to store all subsets and then sort them to obtain the order you described.

sergdev
A: 
pillmuncher
But I don't know python :)
Ahmet Alp Balkan