views:

413

answers:

4

I am having 4 strings:

"h:/a/b/c"
"h:/a/b/d"
"h:/a/b/e"
"h:/a/c"

I want to find the common prefix for those strings, i.e. "h:/a". How to find that?

Usually I'd split the string with delimiter '/' and put it in another list, and so on.
Is there any better way to do it?

+4  A: 

This is the longest common substring problem (although it's a slightly specialized case since you appear to only care about the prefix). There's no library implementation of the algorithm in the .NET platform that you can call directly, but the article linked here is chock-full of steps on how you'd do it yourself.

John Feminella
i dont think that logic will work out here...i just want to trace from first and stop when the difference comes
I gave this a +1 because the original question did not ask for a prefix, it just asked for the common substring (presumably we were supposed to infer if was a prefix from the data, but it was not what was asked).
Greg Beech
Given that only a common prefix is searched, the general longest common substring algorithm would be a terrible overkill (O(n^2) vs. O(n) for only two strings ...). The problem is NOT a "slightly specialized case" but a much easier to solve one :-). Usually, I would give -1 (in order to bring the correct answer up), but given that the question was rephrased, I'll just leave it :-) ...
MartinStettner
@MartinStettner I don't think you should vote down answers to bring the answer you consider correct up. You should down vote answers you think are wrong, and upvote answers you think are correct. If enough people agree with you the answers will rise to the top naturally. The correct answer will be at the top, assuming it is marked as accepted.
Sam Holder
+5  A: 

Just loop round the characters of the shortest string and compare each character to the character in the same position in the other strings. Whilst they all match keep going. As soon as one doesn't match then the string up to the current position -1 is the answer.

Something like (pseudo code)

int count=0;
foreach(char c in shortestString)
{
    foreach(string s in otherStrings)
    {
        if (s[count]!=c)
        {
             return shortestString.SubString(0,count-1); //need to check count is not 0 
        }
    }
    count+=1;
 }
 return shortestString;
Sam Holder
but if i have some 20 strings.. do u think the comparison of the shortest with others char by char is effecient?
If you only have 20 strings I don't think you need to worry about efficiency, but I'm not sure what more you can do as you need to check every string to make sure they are the same. you might be able to do it better if you know if the strings likely to be common for some amount.
Sam Holder
@deepasundari - If you need to find the first different character in the strings, then the minimum number of characters you can compare is the ones that are the same at the start in each string, plus one that is different in one string. So this is provably the most efficient algorithm to find a common prefix.
Greg Beech
Hey thanks yaar..it was very useful
no problem. I would be interested in a profiled comparison with dtb's answer, that might be more efficient on a multiple processor machine. You should also accept an answer.
Sam Holder
+5  A: 
string[] xs = new[] { "h:/a/b/c", "h:/a/b/d", "h:/a/b/e", "h:/a/c" };

string x = string.Join("/", xs.Select(s => s.Split('/').AsEnumerable())
                              .Transpose()
                              .TakeWhile(s => s.All(d => d == s.First()))
                              .Select(s => s.First()));

with

public static IEnumerable<IEnumerable<T>> Transpose<T>(
    this IEnumerable<IEnumerable<T>> source)
{
    var enumerators = source.Select(e => e.GetEnumerator()).ToArray();
    try
    {
        while (enumerators.All(e => e.MoveNext()))
        {
            yield return enumerators.Select(e => e.Current).ToArray();
        }
    }
    finally
    {
        Array.ForEach(enumerators, e => e.Dispose());
    }
}
dtb
will this work only if they are separated by "/"?
Sam Holder
If you remove the `Split` it works with individual characters as well.
dtb
thanks, this is a more interesting answer than mine.
Sam Holder
A clever functional approach! Thanks for the contribution, dtb -- it's always fun reading your take.
John Feminella
hey it is showing error in the transpose method...tell me the namespaces to include..
There is no `Transpose` method in the .NET framework. I used the following implementation; had to change one the `First` calls to `FirstOrDefault` to make it work though: http://www.extensionmethod.net/Details.aspx?ID=152
dtb
I've added my own implementation of a `Transpose` extension method.
dtb
You need a .ToArray() after the final Select otherwise the Join method doesn't work.
Simon D
@Simon D: You don't. http://msdn.microsoft.com/en-us/library/dd783876.aspx
dtb
OK, in .Net 4 you don't, .Net 3.5 you do:http://msdn.microsoft.com/en-us/library/system.string.join(VS.90).aspxhttp://msdn.microsoft.com/en-us/library/system.string.join.aspx
Simon D
A: 

Working code based on Sam Holder's solution (note it gives h:/a/ not h:/a as the longest common initial substring in the question):

using System;

namespace CommonPrefix
{
    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine(CommonPrefix(new[] { "h:/a/b/c", "h:/a/b/d", "h:/a/b/e", "h:/a/c" })); // "h:/a/"
            Console.WriteLine(CommonPrefix(new[] { "abc", "abc" })); // "abc"
            Console.WriteLine(CommonPrefix(new[] { "abc" })); // "abc"
            Console.WriteLine(CommonPrefix(new string[] { })); // ""
            Console.WriteLine(CommonPrefix(new[] { "a", "abc" })); // "a"
            Console.WriteLine(CommonPrefix(new[] { "abc", "a" })); // "a"

            Console.ReadKey();
        }

        private static string CommonPrefix(string[] ss)
        {
            if (ss.Length == 0)
            {
                return "";
            }

            if (ss.Length == 1)
            {
                return ss[0];
            }

            int prefixLength = 0;

            foreach (char c in ss[0])
            {
                foreach (string s in ss)
                {
                    if (s.Length <= prefixLength || s[prefixLength] != c)
                    {
                        return ss[0].Substring(0, prefixLength);
                    }
                }
                prefixLength++;
            }

            return ss[0]; // all strings identical
        }
    }
}
Simon D