views:

75

answers:

2

In C#, Say you have an array of strings, which contain only characters '0' and '1':

string[] input = { "0101", "101", "11", "010101011" };

And you'd like to build a function:

public void IdentifySubstrings(string[] input) { ... }

That will produce the following:

"0101 is a substring of 010101011"
"101 is a substring of 0101"
"101 is a substring of 010101011"
"11 is a substring of 010101011"

And you are NOT able to use built-in string functionality (such as String.Substring).

How would one efficiently solve this problem? Of course you could plow through it via brute force, but it just feels like there ought to be a way to accomplish it with a tree (since the only values are 0's and 1's, it feels like a binary tree ought to fit somehow). I've read a little bit about things like suffix trees, but I'm uncertain if that's the right path to be going down.

Any efficient solutions you can think of?

+2  A: 

First of all, You have no choice but each byte (or bit ;-) in the searched string at least once. Probably best to leave them as bytes. Then implement a Trie (or variant). Load all substrings into the trie. The node objects should contain members identifiying which to of the loaded array elements they belong. Then search it with each substring and make your matches.

FastAl
Would the performance gain over the brute force method here be simply that once you reach a leaf node you can be assured that your test string is not a substring of any of the others?
byte
More thought about this answer: I think this will work and be quite efficient, but I don't think it will identify substrings that begin at positions other than 0. For example, I don't think it will identify "101" as being a substring of "0101".
byte
That is right, you would have to vary the use of trie. A quick way - for example, add to it every substring starting form 2nd byte on out, too. of course this drops you right away to o(n^2) so you'd have to have a savvier variant than that. Tough problem, good luck.
FastAl
Heh, well, I'll vote you up and accept the answer anyway, since it's likeliest to be as good a starting spot as I'll get.
byte
A: 

Haven't tested this, but's it's close

var string2FindLen = string2Find.Length;
var ndx = 0;
var x = string2Find[ndx];
foreach(var c in string2LookIn)
{
    if (ndx == string2FindLen) return true;
    if (c==x) x = string2Find[++ndx];
    else ndx = 0;
}
return false;
Charles Bretana
You may have misunderstood the question; your solution only operates to see if one string is a substring of another, not N strings in N strings.
byte