tags:

views:

338

answers:

4

If I have a sequence as follows (let's say it's an IEnumerable<T>):

[A, B, C, D, E]

Then what's the cleanest way to compute all possible (continuous and non-continuous) sub-sequences of a given length? Ordering of the output isn't important, but it shouldn't include duplicates.

e.g. If I want to compute all possible sub-sequences of length 3 the result set would be:

[A, B, C]
[A, B, D]
[A, B, E]
[A, C, D]
[A, C, E]
[A, D, E]
[B, C, D]
[B, C, E]
[B, D, E]
[C, D, E]


For the record, the accepted answer below gave me a good starting point, and here's the code I've gone with that is updated to use some of the new .NET 3.5 extension methods:

public static IEnumerable<IEnumerable<T>> Subsequences<T>(
    this IEnumerable<T> source, 
    int count)
{
    if (count == 0)
    {
        yield return Enumerable.Empty<T>();
    }
    else
    {
        var skip = 1;
        foreach (var first in source)
        {
            foreach (var rest in source.Skip(skip).Subsequences(count - 1))
            {
                yield return Enumerable.Repeat(first, 1).Concat(rest);
            }

            skip++;
        }
    }
}
+5  A: 

I've had success with IanG's PermuteUtils class:

char[] items = new char[] { 'A', 'B', 'C', 'D', 'E' };

foreach (IEnumerable<char> permutation in PermuteUtils.Permute(items, 3)) {
    Console.Write("[");
    foreach (char c in permutation) {
        Console.Write(" " + c);
    }
    Console.WriteLine(" ]");
}

Results in:

[ A B C ]
[ A B D ]
[ A B E ]
[ A C B ]
[ A C D ]
[ A C E ]
[ A D B ]
[ A D C ]
[ A D E ]
[ A E B ]
[ A E C ]
[ A E D ]
[ B A C ]
[ B A D ]
[ B A E ]
[ B C A ]
[ B C D ]
[ B C E ]
[ B D A ]
[ B D C ]
...
Sean Bright
That isn't quite what I'm after; that is all possible permutations but, for example [A, D, B] is not a subsequence because it isn't in the same order.
Greg Beech
Then again, it's easy to modify to give the right result so I'll give you the accepted answer. Ta.
Greg Beech
Ah, right. Sorry about that.
Sean Bright
+1  A: 

Something like:

static void Main()
{
    string[] data = { "A", "B", "C", "D", "E" };
    WalkSubSequences(data, 3);
}

public static void WalkSubSequences<T>(IEnumerable<T> data, int sequenceLength)
{
    T[] selected = new T[sequenceLength];
    WalkSubSequences(data.ToArray(), selected, 0, sequenceLength);
}
private static void WalkSubSequences<T>(T[] data, T[] selected,
    int startIndex, int sequenceLength)
{
    for (int i = startIndex; i + sequenceLength <= data.Length; i++)
    {
        selected[selected.Length - sequenceLength] = data[i];
        if (sequenceLength == 1)
        {
            ShowResult(selected);
        }
        else
        {
            WalkSubSequences(data, selected, i + 1, sequenceLength - 1);
        }
    }
}

private static void ShowResult<T>(T[] selected)
{
    StringBuilder sb = new StringBuilder();
    sb.Append(selected[0]);
    for (int j = 1; j < selected.Length; j++)
    {
        sb.Append(';').Append(selected[j]);
    }
    Console.WriteLine(sb.ToString());
}
Marc Gravell
It returns the same result as the OP expected...
Marc Gravell
A: 

I would suggest a recursive algorithm for this. I'm sorry, but it has been a while since I did anything in C#, so I'll just give pseudo-code here.

function allPossible(iterator, length, currSubSeq, allResults) {
    // Add the current sub sequence to the results if it is the correct length.
    if (currSubSeq.length == length) {
        copy = currSubSeq.copy();
        allResults.add(copy);
    }
    // If it is too long, return early.
    else if (currSubSeq.length > length) {
        return allResults;
    }

    // Get the next item from the iterator and handle both cases:
    // I.E. when it is, and when it isn't in a sub sequence.
    item = iterator.getNext();
    allPossible(iterator, currSubSeq, allResults);
    currSubSeq.add(item);
    allPossible(iterator, currSubSeq, allResults);

    return allResults;
}

Then you find all possible sub sequences by calling allPossible with an iterator that produces all elements in your original sequence, the length that you want your sub-sequences, an empty sequence of items for currSubSeq, and an empty sequence of item sequences for allResults. I'm assuming pass-by-reference semantics for all the parameters. Sorry that I couldn't give you the proper C# implementation, but I'm sure you know more than enough to take my algorithm sketch and turn it into code.

One last thing. Because this algorithm is recursive, you may have a stack overflow if you run it on a very long sequence with a large length parameter since stack usage is O(2^N) where N = length. I don't think this is a big problem because the algorithm has O(2^N) run-time because of the nature of the problem, so you shouldn't try to run it with a large enough length to overflow the stack anyway!

CAVEAT I haven't actually tested this pseudo-code, so there may be something subtle I haven't thought of.

A. Levy
@A. Levy: "A "non-continuous sub-sequence" isn't a sub-sequence." He's using the proper mathematical definition of subsequence, which is exactly as he described.
Iceman
The question uses 'subsequence' correctly. And subsets are unordered, so that's an entirely wrong word to use too.
ShreevatsaR
I stand corrected.
A. Levy
@Kevin and @ShreevatsaR: Thanks you both for correcting my misunderstanding of subsequences. I was thinking "substring" apparently. I've removed that bit of ignorant pedantry from my answer. @Greg Beech, if you read this comment, perhaps you can include a link to a definition of subsequences so ignoramuses like me can be educated. Such as: http://en.wikipedia.org/wiki/Subsequence
A. Levy
A: 

Here is a solution storing the state in a array of bools. It works by creating the following states on each Next() call (n = 5, k = 3).

1 1 1 . .  Move last 1 right once.
1 1 . 1 .  Move last 1 right once.
1 1 . . 1  Move last 1 right once.
1 . 1 1 .  Move the second last 1 right once and all 1s from the right back.
1 . 1 . 1  Move last 1 right once.
1 . . 1 1  Move the second last 1 right once (and all 1s from the right back.)
. 1 1 1 .  Move the third last 1 right once and all 1s from the right back.
. 1 1 . 1  Move last 1 right once.
. 1 . 1 1  Move the second last 1 right once (and all 1s from the right back.)
. . 1 1 1  Move the third last 1 right once (and all 1s from the right back.)

This state can then be used to select the coresponding items from the supplied sequence for every state.

At first the initialization.

public static Boolean[] Initialize(Int32 n, Int32 k)
{
    return Enumerable.Concat(Enumerable.Repeat(true, k),
                             Enumerable.Repeat(false, n - k)).ToArray();
}

The code to move to the next combination (subsequence).

public static Boolean Next(this Boolean[] list)
{
    Int32 lastOneIndex = Array.LastIndexOf(list, true);

    if (lastOneIndex == -1)
    {
        return false; // All zeros. 0000000
    }
    else if (lastOneIndex < list.Length - 1)
    {
        // Move the last one right once. 1100X00 => 11000X0
        list.MoveBlock(lastOneIndex, lastOneIndex, lastOneIndex + 1);
    }
    else
    {
        Int32 lastZeroIndex = Array.LastIndexOf(list, false, lastOneIndex);

        if (lastZeroIndex == -1)
        {
            return false; // All ones. 1111111
        }
        else
        {
            Int32 blockEndIndex = Array.LastIndexOf(list, true, lastZeroIndex);

            if (blockEndIndex == -1)
            {
                // Move all ones back to the very left. 0000XXX => XXX0000
                list.MoveBlock(lastZeroIndex + 1, lastOneIndex, 0);

                return false; // Back at initial position.
            }
            else
            {
                // Move the block end right once. 11X0011 => 110X011
                list.MoveBlock(blockEndIndex, blockEndIndex, blockEndIndex + 1);
                // Move the block of ones from the very right back left. 11010XX => 1101XX0
                list.MoveBlock(lastZeroIndex + 1, lastOneIndex, blockEndIndex + 2);
            }
        }
    }

    return true;
}

Finally some helper methods.

public static void MoveBlock(this Boolean[] list, Int32 oldStart, Int32 oldEnd, Int32 newStart)
{
    list.ClearBlock(oldStart, oldEnd);
    list.SetBlock(newStart, newStart + oldEnd - oldStart);
}

public static void SetBlock(this Boolean[] list, Int32 start, Int32 end)
{
    list.SetBlockToValue(start, end, true);
}

public static void ClearBlock(this Boolean[] list, Int32 start, Int32 end)
{
    list.SetBlockToValue(start, end, false);
}

public static void SetBlockToValue(this Boolean[] list, Int32 start, Int32 end, Boolean value)
{
    for (int i = start; i <= end; i++)
    {
        list[i] = value;
    }
}

And a usage example using a string instead of a list.

var sequence = "ABCDE";

var state = Initialize(sequence.Count(), 5);

do
{
    Console.WriteLine(new String(sequence.Where((_, idx) => state[idx]).ToArray()));
}
while (state.Next());
Daniel Brückner