ansaurus

Question

Answer 1

+2 A:

Very tough question, but I'll give it a shot. This is more of a stream of consciousness than an answer, apologies in advance.

If I understand this correctly, you're given 2 equal sized sequences of strings, A and B, indexed from 1..n, say. You then have to find a sequence of indices such that the concatenation of strings A(1)..A(m) equals the concatenation of strings B(1)..B(m) where m is the length of the sequence of indices.

The first thing I would observe is that there could be an infinite number of solutions. For example, given:

A { "x", "xx" }
B { "xx", "x" }

Possible solutions are:

{ 1, 2 }
{ 2, 1 }
{ 1, 2, 1, 2 }
{ 1, 2, 2, 1 }
{ 2, 1, 1, 2 }
{ 2, 1, 2, 1 }
{ 1, 2, 1, 2, 1, 2}
...

So how would you know when to stop? As soon as you had one solution? As soon as one of the solutions is a superset of another solution?

One place you could start would be by taking all the strings of minimum common length from both sets (in my example above, you would take the "x" from both, and searching for 2 equal strings that share a common index. You can then repeat this for strings of the next size up. For example, if the first set has 3 strings of length 1, 2 and 3 respectively, and the second set has strings of length 1, 3 and 3 respectively, you would take the strings of length 3. You would do this until you have no more strings. If you find any, then you have a solution to the problem.

It then gets harder when you have to start combining several strings as in my example above. The naive, brute force approach would be to start permuting all strings from both sets that, when concatenated, result in strings of the same length, then compare them. So in the below example:

A { "ga", "bag", "ac", "a" }
B { "ba", "g", "ag", "gac" }

You would start with sequences of length 2:

A { "ga", "ac" }, B { "ba", "ag" } (indices 1, 3)
A { "bag", "a" }, B { "g", "gac" } (indices 2, 4)

Comparing these gives "gaac" vs "baag" and "baga" vs "ggac", neither of which are equal, so there are no solutions there. Next, we would go for sequences of length 3:

A { "ga", "bag", "a" }, B { "ba", "g", "gac" } (indices 1, 2, 4)
A { "bag", "ac", "a" }, B { "g", "ag", "gac" } (indices 2, 3, 4)

Again, no solutions, so then we end up with sequences of size 4, of which we have no solutions.

Now it gets even trickier, as we have to start thinking about perhaps repeating some indices, and now my brain is melting.

I'm thinking looking for common subsequences in the strings might be helpful, and then using the remaining parts in the strings that were not matched. But I don't quite know how.

IRBMe 2009-07-11 11:52:56

The problem was to find the shortest sequence, so you could just return the first sequence you found that matched and claim it was the shortest, and challenge anyone to find a shorter one.

Matthew Scharley 2009-07-11 11:55:57

Answer 2

+2 A:

A very simple way is to just use something like a breadth-first search. This also has the advantage that the first solution found will have minimal size.

mweerden 2009-07-11 12:00:23

Answer 3

A:

Here's a suggestion for a brute force search. First generate number sequences bounded to the length of your list:

[0,0,..] [1,0,..] [2,0,..] [3,0,..] [0,1,..] ...

The number sequence length determines how many strings are going to be in any solution found. Then generate A and B strings by using the numbers as indexes into your string lists:

public class FitSequence 
{
    private readonly string[] a;
    private readonly string[] b;

    public FitSequence(string[] a, string[] b)
    {
        this.a = a;
        this.b = b;
    }

    private static string BuildString(string[] source, int[] indexes)
    {
        var s = new StringBuilder();
        for (int i = 0; i < indexes.Length; ++i)
        {
            s.Append(source[indexes[i]]);
        }
        return s.ToString();
    }

    public IEnumerable<int[]> GetSequences(int length)
    {
        foreach (var numberSequence in new NumberSequence(length).GetNumbers(a.Length - 1))
        {
            string a1 = BuildString(a, numberSequence);
            string b1 = BuildString(b, numberSequence);
            if (a1 == b1)
                yield return numberSequence;
        }
    }
}

This algorithm assumes equal lengths for A and B. I tested your example with

    static void Main(string[] args)
    {
        var a = new[] {"kk", "ka", "kkk", "a"};
        var b = new[] {"ka", "kakk", "ak", "k"};
        for (int i = 0; i < 100; ++i)
            foreach (var sequence in new FitSequence(a, b).GetSequences(i))
            {
                foreach (int x in sequence)
                    Console.Write("{0} ", x);
                Console.WriteLine();
            }
    }

but could not find any solutions, though it seemed to work for simple tests.

Hans Malherbe 2009-07-11 14:19:26

Answer 4

A:

It is not clear what the 'solution' you are looking for is, the longest solution? the shortest? all solutions?
Since you allow repetition there will an infinite number of solutions for some inputs so I will work on:

Find all sequences under a fixed length.

Written as a pseudo code but in a manner very similar to f# sequence expressions

// assumed true/false functions

let Eq aList bList =  
// eg Eq "ab"::"c" "a" :: "bc" -> true
// Eq {} {} is _false_

let EitherStartsWith aList bList =  
// eg "ab"::"c" "a" :: "b" -> true
// eg "a" "ab" -> true
// {} {} is _true_    

let rec FindMatches A B aList bList level
    = seq {
        if level > 0
            if Eq aList bList
                yield aList
            else if EitherStartsWith aList bList
                Seq.zip3 A B seq {1..} 
                |> Seq.iter (func (a,b,i) -> 
                    yield! FindMatches A B aList::(a,i) bList::(b,i) level - 1) }

let solution (A:seq<string>) (B:seq<string>) length =
    FindMatches A B {} {} length

Some trivial constraints to reduce the problem:

The first selection pair must have a common start section.
the final selection pair must have a common end section.

Based on this we can quickly eliminate many inputs with no solution

let solution (A:seq<string>) (B:seq<string>) length =
    let starts = {}
    let ends = {}
    Seq.zip3 A B seq {1..} 
    |> Seq.iter(fun (a,b,i) -> 
        if (a.StartsWith(b) or b.StartsWith(a))
            start = starts :: (a,b,i)
        if (a.EndsWith(b) or b.EndsWith(a))
            ends = ends :: (a,b,i))

    if List.is_empty starts || List.is_empty ends
        Seq.empty // no solution
    else
        Seq.map (fun (a,b,i) -> 
            FindMatches A B {} :: (a,i) {} :: (b,i) length - 1)
        starts 
        |> Seq.concat

ShuggyCoUk 2009-07-11 16:37:48

This is pseudo code so will need work to get right in f# properly

ShuggyCoUk 2009-07-11 16:38:48

ansaurus

tags:

views:

answers:

Interesting strings algorithm

Find all sequences under a fixed length.

related questions