views:

205

answers:

6

I've noticed a lot of little debates about when to use regex and when to use a built in string function like String.Replace() (.NET).

It seems a lot of people recommend always, always, always using regex whenever you deal with strings at all (besides just displaying them). Is this really best practice or just a wrong impression on my part? It seems like overkill to use regex when the problem is just "Remove any occurrence of any of these words from this text".

I'd like input so I can improve my own code and to better answer other people's questions about string manipulation (there's a lot of them).

+8  A: 

I think it's a wrong impression to use Regex as a catch-all solution when string based search/replace is possible.

Regex is instrinsically a process of pattern matching and should be used when the types of strings you want to match are variable or only conform to a particular pattern. For cases when a simple string search would suffice, I would always recommend using the in-built methods of the String class.

I have never seen any performance statistics suggesting that a Regex based lookup is faster or more performant than string indexing. Additionally, Regex engines vary in their execution capabilities.

As if that were not enough, it is quite easy to construct a Regex that performs quite badly (uses a lot of backtracking, for instance) so deep knowledge of Regex is required if you really want to optimize performance using Regex matching. On the other hand, it is quite simple even for a n00b to perform string based searches or replacements.

Cerebrus
That's what I thought as well. It just always seems that for every string based answer there are 5 regex answers.
colithium
You mean, on StackOverflow ? That may be because people tend to ask only non-obvious string related questions and resolve the rest themselves. The non-obvious type problems often require a Regex solution. Still, this should not be considered a generic representation of the pros and cons of both methods. :-)
Cerebrus
A: 

I only tend to use Regex when I need to match specific patterns in large strings. For basic string manipulation I tend to use the built in .NET components.

James
A: 

I would tend to think that if there is a dedicated function to manipulate a string the way you want as part of the string class, it should pretty close to 'good' where-as regex is general purpose.

But as with anything subjective, if you are concerned about performance time the different methods.

Then again do what easiest to understand, and do performance monitoring to find the real bottle necks as you go.

Simeon Pilgrim
+2  A: 

Regex.Replace() is much more expensive than the String.Replace() method. Use String.Replace() when possible, and use Regex when it's a necessity.

Take a look at this benchmark to see the time differences.

Sev
+1  A: 

I just love regexes but if there is a simple xxx->replace("foo","bar") type function available it seems silly to use a power tool like regex when a simple screwdriver would do.

If performance is an issue then regex can be very cpu consuming for simple substitutions. (Regex usually works out more efficient on a complex search/transform than a series of "simpler" calls).

Also I get continually caught out by the "minor" implementation differences -- like Pythons implied "^...$" on the match() builtin. I was on the road with no internet access at the time and ended up buying another copy of Lutz's book to find out what was going on!

James Anderson
Power tools are fun, till you nail your hand to the wall with the nail gun
Matthew Scharley
A: 

Obviously, for complex search/match/replace operations, regexes are the way to go. For simple stuff like replacing a single word by another word, normal string methods are preferred.

But in many cases, it's not that simple. Sometimes you come across a situation where you could use standard string operations, while the regex solution is more elegant. Even if the vanilla string algorithm is 10 times faster, it's always a good idea to ask yourself if it matters in that particular piece of code (for example if the code isn't executed in a loop).

I would prefer the readability of a simple regex operation over a more complex, but faster algorithm using pure string operations.

Just my 2 cents...

Philippe Leybaert