ansaurus

Question

Regex vs. Manual comparison. Which is faster?

Answer 1

+7 A:

Of course four comparisons of small chunks of memory are greatly faster (and using almost no memory) than building, running and destroying a state machine.

wRAR 2010-04-05 18:46:04

+1. Note that more complicated examples might well be faster to use the regex.

Billy ONeal 2010-04-05 18:47:22

Well, when does Regex beat out (naive) manual comparisons?

Earlz 2010-04-05 18:47:31

My guess is never, but to hand-code a matcher for a regular language could become very complicated compared to writing a regex.

danben 2010-04-05 18:49:32

Of course, this assumes you build your "manual comparison" engine such that it never makes more comparisons than it needs to. Eventually this would reduce to a state machine anyway.

danben 2010-04-05 18:51:09

wRAR 2010-04-05 18:53:03

@wRAR, so Regex is usually not the right choice?

Earlz 2010-04-05 18:56:07

@Earlz - for this specific test, a regex would be overkill. But regex is incredibly useful ... for example, if your is_whitespace function is part of a class that does similar things as regex, you may be able to replace the entire class contents with a few regex matches. But if all you're interested is bare execution speed, a regex for a trivial exact match is overkill.

overslacked 2010-04-05 19:02:35

@Earlz: I heard that Perl addicts like to do anything with regexes (at least other languages don't have such native constructs as Perl's ~=) but usually there are cleaner AND faster ways to do that. And while programs in Python usually are not viewed from the performance perspective, implementation details of .NET regexps may cause nightmares.

wRAR 2010-04-05 19:05:20

Answer 2

+1 A:

In most cases, the regex to find a something like a whitespace character is very fast. You have many eyeballs looking at performance in the leading regex implementations and there are probably other areas of 'low hanging fruit' for optimization in other areas of your code.

The areas of bad performance of a regex is a poorly written regex. Tips are to avoid as much unnecessary backtracking, grouping and alteration as possible. Use something like "Regex Buddy" or Perl with "use re debug" to see how many branches your regex takes.

The links are to some regex performance issues.

If in doubt, do comparative timings...

Coding Horor- Regex

Java Performance - Regex

drewk 2010-04-05 18:55:41

Answer 3

+1 A:

The manual comparison is faster to execute, the regex comparison is faster to type.

Note that your two implementations are not equivalent if your system uses Unicode. The regex \s matches all Unicode whitespace while your manual comparison only handles basic ASCII and does not even include the vertical tab and form feed characters which are usually also considered whitespace.

If you're writing this in a high-level language I'd suggest using the is_whitespace() function already provided by your programming language's libraries. A basic function like that is almost always included.

So in the end the answer is "it depends". In some situations the extra programming effort of using procedural code is warranted. In many cases the regex is fast enough and easier to maintain.

Jan Goyvaerts 2010-04-06 09:17:07

Doesn't that depend though on if your Regex implementation is unicode-aware though? (and your language. For instance, I think Ruby is still not unicode-aware)

Earlz 2010-04-06 15:10:54

I said: "if your system uses Unicode". With that I meant both the programming language and regex flavor. The regex flavor in Ruby 1.8 does not support Unicode, the one in Ruby 1.9 does.

Jan Goyvaerts 2010-04-08 02:32:15

ansaurus

tags:

views:

answers:

Regex vs. Manual comparison. Which is faster?

related questions