ansaurus

Question

Are there particular cases where native text manipulation is more desirable than regex?

Answer 1

+5 A:

I prefer text manipulation over regular expressions to parse delimited string input. It's far simpler (for me at least) to issue a string split than to manage a regular expression.

Given some text:

value1, value2, value3

You can parse the line easily:

var values = myString.Split(',');

I'm sure there's a better way but with regular expressions you'd have to do something like:

var match = Regex.Match(myString, "^([^,]*),([^,]*),([^,]*)$");
var value1 = match.Group[1];
...

Ray Vernagus 2009-06-24 12:57:59

+1. Also, fixed field length records

John Pirie 2009-06-24 13:01:45

It should be noted that "split" functions often split on a regex.

Svante 2009-06-24 16:12:21

Answer 2

A:

I'll usually just use text manipulation for simple string replacements (e.g. replacing tokens in a template with actual values). You could certainly do this with Regex, but replacements are much easier.

Eric Petroelje 2009-06-24 13:03:08

Answer 3

A:

Yes. Example:

char* basename (const char* path)
{
char* p = strrchr(path, '/');
return (p != NULL) ? (p+1) : path;
}

2009-06-24 13:05:13

Answer 4

+1 A:

RegEx's are very flexible and powerful, because they are in many ways similar to an eval() statement. That being said, depending on the implementation, they can be a bit slow. Normally, this is not an issue, however, if they can be avoided in a particularly costly loop, that can boost performance.

That being said, I tend to use them, and only worry about performance when the app is "done" and I have real benchmarks to prove I need to tweak performance. i.e, avoid premature optimization.

Tim Hoolihan 2009-06-24 13:06:09

Answer 5

+1 A:

Regex parsing and execution refers the host language to defer processing to its regex "engine". This adds overhead, so for any instance where native string manipulation could be used it is preferable for speed (and readability!).

Andy 2009-06-24 13:07:34

Answer 6

+2 A:

When you can do it simply with native text manipulation, it is usually preferable (simpler to read & better performance) not to use regex.

Personal rule of thumb: if it's tricky or relatively longer to do it "manually" and that performance gain is negligible, don't. Else do.

Don't examples:

split
simple find & replace
long text
loop
existing native functions (like, in PHP, strrchr, ucwords...)

streetpc 2009-06-24 13:09:21

Answer 7

+1 A:

Using a regex basically means embedding a tiny program, written in a different programming language, in the middle of your program. I'll ignore the inefficiency of using a regex over native string manipulation, because it probably isn't relevant in most cases.

I prefer native text manipulation over regex any time native text manipulation will be easier to follow for other people. Which is true quite frequently, since plenty of the people around me are not strongly familiar with regex. Unless working with something that is very much about parsing (via regex) they should not need to be!

Regular expressions are usually slower, less readable, and harder to debug than native string manipulation.

The main case where I'll prefer regex over string manipulation is when I want to be able to have different ways to parse strings dependning on the source, and the types of sources will increase over time. Native string manipulation is not really practical in this case. I've had cases where I've stuck a regex column in a database...

Brian 2009-06-24 13:09:45

Answer 8

+1 A:

Whenever the same result can be achieved with a reasonable amount of code.

Regular expressions are very powerful, but they tend to get hard to read. If you can do the same with simple string operations that usually means that the code gets easier to manage and maintain.

There is some overhead in setting up the object and parsing the expression. For simpler string manipulation you can get better performance with simple string methods.

Example:
Getting the file name from a file path (yes, I know that the Path class should be used for that, it's just an example...)

string name = Regex.Match(path, @"([^\\]+)$").Groups[0].Value;

vs.

string name = path.Substring(path.LastIndexOf('\\') + 1);

The second solution is straight forward and does the minimal work needed to get the result. The regular expression solution produces the same result, but it does more work to parse the string, and it produces a bunch of objects that is not needed for the result.

Guffa 2009-06-24 13:11:33

ansaurus

tags:

views:

answers:

Are there particular cases where native text manipulation is more desirable than regex?

related questions