views:

289

answers:

4

Is there any difference in speed/memory usage for these two equivalent expressions:

Regex.IsMatch(Message, "1000")

Vs

Message.Contains("1000")

Any situations where one is better than other ?

The context of this question is as follows: I was making some changes to legacy code which contained the Regex expression to find whether a string is contained within another string. Being legacy code I did not make any changes to that and in the code review somebody suggested that Regex.IsMatch should be replaced by string.Contains. So I was wondering whether the change was worth making.

A: 

Yes, for this task, string.Contains will almost certainly be faster and use less memory. And in of course, there's no reason to use regex here.

Matthew Flaschen
+9  A: 

For simple cases String.Contains will give you better performance but String.Contains will not allow you to do complex pattern matching. Use String.Contains for non-pattern matching scenarios (like the one in your example) and use regular expressions for scenarios in which you need to do more complex pattern matching.

A regular expression has a certain amount of overhead associated with it (expression parsing, compilation, execution, etc.) that a simple method like String.Contains simply does not have which is why String.Contains will outperform a regular expression in examples like yours.

Andrew Hare
A: 

To determine which is the fastest you will have to benchmark your own system. However, regular expressions are complex and chances are that String.Contains() will be the fastest and in your case also the simplest solution.

The implementation of String.Contains() will eventually call the native method IndexOfString() and the implementation of that is only known by Microsoft. However, a good algorithm for implementing this method is using what is known as the Knuth–Morris–Pratt algorithm. The complexity of this algorithm is O(m + n) where m is the length of the string you are searching for and n is the length of the string you are searching making it a very efficient algorithm.

Actually, the efficiency of search using regular expression can be as low O(n) depending on the implementation so it may still be competetive in some situations. Only a benchmark will be able to determine this.

If you are really concerned about search speed Christian Charras and Thierry Lecroq has a lot of material about exact string matching algorithms at Université de Rouen.

Martin Liversage
A: 

Wrong. String.Contains is slower when you compare it to a compiled regular expression. Considerably slower, even!

You can test it running this benchmark:

class Program
{
  public static int FoundString;
  public static int FoundRegex;

  static void DoLoop(bool show)
  {
    const string path = "C:\\file.txt";
    const int iterations = 1000000;
    var content = File.ReadAllText(path);

    const string searchString = "this exists in file";
    var searchRegex = new Regex("this exists in file");

    var containsTimer = Stopwatch.StartNew();
    for (var i = 0; i < iterations; i++)
    {
      if (content.Contains(searchString))
      {
        FoundString++;
      }
    }
    containsTimer.Stop();

    var regexTimer = Stopwatch.StartNew();
    for (var i = 0; i < iterations; i++)
    {
      if (searchRegex.IsMatch(content))
      {
        FoundRegex++;
      }
    }
    regexTimer.Stop();

    if (!show) return;

    Console.WriteLine("FoundString: {0}", FoundString);
    Console.WriteLine("FoundRegex: {0}", FoundRegex);
    Console.WriteLine("containsTimer: {0}", containsTimer.ElapsedMilliseconds);
    Console.WriteLine("regexTimer: {0}", regexTimer.ElapsedMilliseconds);

    Console.ReadLine();
  }

  static void Main(string[] args)
  {
    DoLoop(false);
    DoLoop(true);
    return;
  }
}
Running it on a random EDIFACT INVRP file of 60kb with "this exists in file" stuffed in halfway through:containsTimer: 84925regexTimer: 10633