views:

301

answers:

5

A consultant came by yesterday and somehow the topic of strings came up. He mentioned that he had noticed that for strings less than a certain length, Contains is actually faster than StartsWith. I had to see it with my own two eyes, so I wrote a little app and sure enough, Contains is faster!

How is this possible?

DateTime start = DateTime.MinValue;
DateTime end = DateTime.MinValue;
string str = "Hello there";

start = DateTime.Now;
for (int i = 0; i < 10000000; i++)
{
    str.Contains("H");
}
end = DateTime.Now;
Console.WriteLine("{0}ms using Contains", end.Subtract(start).Milliseconds);

start = DateTime.Now;
for (int i = 0; i < 10000000; i++)
{
    str.StartsWith("H");
}
end = DateTime.Now;
Console.WriteLine("{0}ms using StartsWith", end.Subtract(start).Milliseconds);

Outputs:

726ms using Contains 
865ms using StartsWith

I've tried it with longer strings too!

+1  A: 

Keep in mind you're saving 139 milliseconds over 10 million iterations. Sure, if you're trying to set a world record, go for it. Otherwise, use what is going to best convey the meaning of what you're trying to do.

Also keep in mind, the rough equivalent for StartsWith would not be Contains, but rather IndexOf with a result of 0.

Anthony Pegram
I think you misunderstood. I'm not trying to break any records or anything, I'm wondering how it's possible that it's slower to check if something STARTS with something else than it is to check if it CONTAINS it.
statichippo
@statichippo, my answer was from the standpoint of someone saying "I'm going to use this because it's *faster*!" That may not be *your* intention, but someone who finds this on Google might very well be thinking exactly that.
Anthony Pegram
+6  A: 

Try using StopWatch to measure the speed instead of DateTime checking.

http://stackoverflow.com/questions/2923283/stopwatch-vs-using-system-datetime-now-for-timing-events

I think the key is the following the important parts bolded:

Contains:

This method performs an ordinal (case-sensitive and culture-insensitive) comparison.

StartsWith:

This method performs a word (case-sensitive and culture-sensitive) comparison using the current culture.

I think the key is the ordinal comparison which amounts to:

An ordinal sort compares strings based on the numeric value of each Char object in the string. An ordinal comparison is automatically case-sensitive because the lowercase and uppercase versions of a character have different code points. However, if case is not important in your application, you can specify an ordinal comparison that ignores case. This is equivalent to converting the string to uppercase using the invariant culture and then performing an ordinal comparison on the result.

References:

http://msdn.microsoft.com/en-us/library/system.string.aspx

http://msdn.microsoft.com/en-us/library/dy85x1sa.aspx

http://msdn.microsoft.com/en-us/library/baketfxw.aspx

Using Reflector you can see the code for the two:

public bool Contains(string value)
{
    return (this.IndexOf(value, StringComparison.Ordinal) >= 0);
}

public bool StartsWith(string value, bool ignoreCase, CultureInfo culture)
{
    if (value == null)
    {
        throw new ArgumentNullException("value");
    }
    if (this == value)
    {
        return true;
    }
    CultureInfo info = (culture == null) ? CultureInfo.CurrentCulture : culture;
    return info.CompareInfo.IsPrefix(this, value,
        ignoreCase ? CompareOptions.IgnoreCase : CompareOptions.None);

}

Kelsey
Yes! This is correct. As Daniel pointed out in another comment, passing StringComparison.Ordinal to StartsWith will make StartsWith much faster than Contains. I just tried it and got "748.3209ms using Contains154.548ms using StartsWith"
StriplingWarrior
+13  A: 

I figured it out. It's because StartsWith is culture-sensitive, while Contains is not. That inherently means StartsWith has to do more work.

FWIW, here are my results on Mono with the below (corrected) benchmark:

1988.7906ms using Contains
10174.1019ms using StartsWith

I'd be glad to see people's results on MS, but my main point is that correctly done (and assuming similar optimizations), I think StartsWith has to be slower:

using System;
using System.Diagnostics;

public class ContainsStartsWith
{
    public static void Main()
    {
        string str = "Hello there";

        Stopwatch s = new Stopwatch();
        s.Start();
        for (int i = 0; i < 10000000; i++)
        {
            str.Contains("H");
        }
        s.Stop();
        Console.WriteLine("{0}ms using Contains", s.Elapsed.TotalMilliseconds);

        s.Reset();
        s.Start();
        for (int i = 0; i < 10000000; i++)
        {
            str.StartsWith("H");
        }
        s.Stop();
        Console.WriteLine("{0}ms using StartsWith", s.Elapsed.TotalMilliseconds);

    }
}
Matthew Flaschen
Really good guess, but likely not. He's not passing in the culture, and this line is in the implementation of StartsWith: `CultureInfo info = (culture == null) ? CultureInfo.CurrentCulture : culture;`
Marc Bollinger
@Marc Bollinger - All you've shown there is that StartsWith is culture-sensitive, which is the claim.
Lee
@Marc, right. It's using the current culture. That's culture-sensitive, and some cultures rely on quite complex normalization rules.
Matthew Flaschen
StartsWith uses CurrentCulture by default, which means the comparison has to check for equalities like "æ"=="ae". Contains doesn't do those expensive checks. Pass StringComparison.Ordinal to StartsWith to make it as fast as Contains.
Daniel
Why does Microsoft pick different rules for different string methods? It's maddening!
Qwertie
A: 

I twiddled around in Reflector and found a potential answer:

Contains:

return (this.IndexOf(value, StringComparison.Ordinal) >= 0);

StartsWith:

...
    switch (comparisonType)
    {
        case StringComparison.CurrentCulture:
            return CultureInfo.CurrentCulture.CompareInfo.IsPrefix(this, value, CompareOptions.None);

        case StringComparison.CurrentCultureIgnoreCase:
            return CultureInfo.CurrentCulture.CompareInfo.IsPrefix(this, value, CompareOptions.IgnoreCase);

        case StringComparison.InvariantCulture:
            return CultureInfo.InvariantCulture.CompareInfo.IsPrefix(this, value, CompareOptions.None);

        case StringComparison.InvariantCultureIgnoreCase:
            return CultureInfo.InvariantCulture.CompareInfo.IsPrefix(this, value, CompareOptions.IgnoreCase);

        case StringComparison.Ordinal:
            return ((this.Length >= value.Length) && (nativeCompareOrdinalEx(this, 0, value, 0, value.Length) == 0));

        case StringComparison.OrdinalIgnoreCase:
            return ((this.Length >= value.Length) && (TextInfo.CompareOrdinalIgnoreCaseEx(this, 0, value, 0, value.Length, value.Length) == 0));
    }
    throw new ArgumentException(Environment.GetResourceString("NotSupported_StringComparison"), "comparisonType");

And there are some overloads so that the default culture is CurrentCulture.

So first of all, Ordinal will be faster (if the string is close to the beginning) anyway, right? And secondly, there's more logic here which could slow things down (although so so trivial)

statichippo
I don't agree that `CultureInfo.CurrentCulture.CompareInfo.IsPrefix` is trivial.
Matthew Flaschen
+1 -- I didn't really read it to be honest, I was just referring to the sheer amount of code ;)
statichippo
A: 

Comparing the time between two events using datetime comparisons is not really a good idea. There are so many other factors that can affect your calculations. Have you tried switching the order of execution? Is the result exactly same? Moreover, logically thinking, startswith should be faster. Isn't it?

Chinmoy
This really isn't helpful. "Moreover, logically thinking, startswith should be faster" That's the whole reason he's asking the question.
Matthew Flaschen