views:

153

answers:

8

Are there particular cases where native text manipulation is more desirable than regex? In particular .net?

Note: Regex appears to be a highly emotive subject, so I am wary of asking such a question. This question is not inviting personal/profession opinions on regex, only specific situations where a solution including its use is not as good as language native commands (including those which have underlying code using regex) and why.

Also, note that Desirable can mean performance, can mean code-readability; it does not mean panacea, as each solution for a problem has its benefits and limitations.

Apologies if this is a duplicate, I have searched SO for a similar question.

+5  A: 

I prefer text manipulation over regular expressions to parse delimited string input. It's far simpler (for me at least) to issue a string split than to manage a regular expression.

Given some text:

value1, value2, value3

You can parse the line easily:

var values = myString.Split(',');

I'm sure there's a better way but with regular expressions you'd have to do something like:

var match = Regex.Match(myString, "^([^,]*),([^,]*),([^,]*)$");
var value1 = match.Group[1];
...
Ray Vernagus
+1. Also, fixed field length records
John Pirie
It should be noted that "split" functions often split on a regex.
Svante
A: 

I'll usually just use text manipulation for simple string replacements (e.g. replacing tokens in a template with actual values). You could certainly do this with Regex, but replacements are much easier.

Eric Petroelje
A: 

Yes. Example:

char* basename (const char* path)
{
  char* p = strrchr(path, '/');
  return (p != NULL) ? (p+1) : path;
}

+1  A: 

RegEx's are very flexible and powerful, because they are in many ways similar to an eval() statement. That being said, depending on the implementation, they can be a bit slow. Normally, this is not an issue, however, if they can be avoided in a particularly costly loop, that can boost performance.

That being said, I tend to use them, and only worry about performance when the app is "done" and I have real benchmarks to prove I need to tweak performance. i.e, avoid premature optimization.

Tim Hoolihan
+1  A: 

Regex parsing and execution refers the host language to defer processing to its regex "engine". This adds overhead, so for any instance where native string manipulation could be used it is preferable for speed (and readability!).

Andy
+2  A: 

When you can do it simply with native text manipulation, it is usually preferable (simpler to read & better performance) not to use regex.

Personal rule of thumb: if it's tricky or relatively longer to do it "manually" and that performance gain is negligible, don't. Else do.

Don't examples:

  • split
  • simple find & replace
  • long text
  • loop
  • existing native functions (like, in PHP, strrchr, ucwords...)
streetpc
+1  A: 

Using a regex basically means embedding a tiny program, written in a different programming language, in the middle of your program. I'll ignore the inefficiency of using a regex over native string manipulation, because it probably isn't relevant in most cases.

I prefer native text manipulation over regex any time native text manipulation will be easier to follow for other people. Which is true quite frequently, since plenty of the people around me are not strongly familiar with regex. Unless working with something that is very much about parsing (via regex) they should not need to be!

Regular expressions are usually slower, less readable, and harder to debug than native string manipulation.

The main case where I'll prefer regex over string manipulation is when I want to be able to have different ways to parse strings dependning on the source, and the types of sources will increase over time. Native string manipulation is not really practical in this case. I've had cases where I've stuck a regex column in a database...

Brian
+1  A: 

Whenever the same result can be achieved with a reasonable amount of code.

Regular expressions are very powerful, but they tend to get hard to read. If you can do the same with simple string operations that usually means that the code gets easier to manage and maintain.

There is some overhead in setting up the object and parsing the expression. For simpler string manipulation you can get better performance with simple string methods.

Example:
Getting the file name from a file path (yes, I know that the Path class should be used for that, it's just an example...)

string name = Regex.Match(path, @"([^\\]+)$").Groups[0].Value;

vs.

string name = path.Substring(path.LastIndexOf('\\') + 1);

The second solution is straight forward and does the minimal work needed to get the result. The regular expression solution produces the same result, but it does more work to parse the string, and it produces a bunch of objects that is not needed for the result.

Guffa