views:

37

answers:

3

Hey all,

I'm, currently working my first project in .NET 4.0 and it requires several thousand string comparisons (I'm searching directories and sometimes entire drives for certain files). For the most part, the strings are quite short because I'm only looking at file paths so I have just made use of String.Contains() to see if the file path string contains my needle string.

I was wondering though, would Regex be a better idea? At what point will the Regex be faster than a standard string comparison? Is it based on the length of the strings being compared or the number of strings being compared?

Thanks, Sonny

+1  A: 

If your search expression is simple then I don't think it's worth moving to a Regex - no matter how good you are at coding and reading them it will take you more time to understand the code when you (or more importantly, some one else) look at it again in 6 months time.

If the speed improvements are only marginal stay with the more readable, maintainable code.

ChrisF
A: 

I'm just guessing, but I suspect that for simple substring searches there will be little difference in performance between String.Contains(), String.IndexOf() and regex (if anything, I'd guess that regex would never be faster, but might be slower by a miniscule amount).

You shouldn't give any thought about moving to regex unless your requirements are (or become) such that you need to match on something more complex than a substring.

Michael Burr
+1  A: 

It's variable. Comparison performance is a complex function of the input data, the culture being used for comparing, case sensitivity and CompareOptions. A Regex object is more expensive to instantiate (unless it's in the Regex cache), so if you're doing a lot of one off comparisons, it not that great to use and I've found it's typically slower than IndexOf(), but YMMV.

Keep in mind that when using Contains/IndexOf that the culture under which the user/thread is running will decide how the comparison is done. That can have a significant impact on performance. Not all cultures are as fast.

The Invariant culture is a very fast culture. If you use a CompareInfo directly, rather than doing String.IndexOf(), it will be somewhat faster still.

CultureInfo.InvariantCulture.CompareInfo.IndexOf(..)

The only way to have some confidence in making the right choice is to benchmark. That said, unless you're shifting through many megabytes of strings, it won't make a difference that matters to anyone. As ChrisF said earlier, focus on readable/maintainble code in that case.

Here's a good article on getting the most out of regex: Optimizing Regular Expression Performance

FrederikB