does regex comparisons consume lots of resources?

tags:

regex

views:

225

answers:

+2 Q:

does regex comparisons consume lots of resources?

i dunno, but will your machine suffer great slowdown if you use a very complex regex? like for example the famous email validation module proposed just recently? which can be found here RFC822

update: sorry i had to ask this question in a hurry anyway i posted the link to the email regex i was talking about

+3 A:

It highly depends on the individual regex: features like look-behind or look-ahead can get very expensive, while simple regular expressions are fine for most situations.

Tutorials on http://www.regular-expressions.info/ offer performance advice, so that can be a good start.

Alexey Rusakov 2008-10-22 08:09:27

I once made a program that analyzed a lot of text (a big code base, >300k lines). First I used regex but when I switched to using regular string functions it got a lot faster, like taking 40% of the time of the regex version. So while of course it depends, my thing got a lot faster.

Niklas Winde 2008-10-22 08:14:42

Once I had written a greedy - accidentally, of course :-) - a multi-line regex and had it search/replace on 10 * 200 GB of text files. It was damn slow... So it depends what you write, and what you check.

Zsolt Botykai 2008-10-22 08:33:30

Depends on the complexity of the expression and the language the expression is used with.

In JavaScript; you have to optimize everything. In C#; not so much.

roosteronacid 2008-10-22 08:39:42

JS RegExp performs much better than this lets on.

eyelidlessness 2008-10-22 09:14:13

+2 A:

Depends also on how well you optimise your query, and knowing the internal working of regex.

Using the negated character class, for example, saves the cost of having the engine backtracking characters (i.e. /<[^>]+>/ instead of /<.+?>/)(*).Trivial in small matches, but saves a lot of cycles when you have to match inside a big chunk of text.

And there are many other ways to save resources in regex operations, so performance can vary wildly.

example taken from http://www.regular-expressions.info/repeat.html

Berzemus 2008-10-22 08:53:51

+3 A:

Regexes are usually implemented as one of two algorithms (NFA or DFA) that correspond to two different FSMs. Different languages and even different versions of the same language may have a different type of regex. Naturally, some regexes work faster in one and some work faster in the other. If it's really critical, you might want to find what type of regex FSM is implemented.

I'm no expert here. I got all this from reading Mastering Regular Expressions by Jeffrey E. F. Friedl. You might want to look that up.

Nathan Fellman 2008-10-22 09:41:56

+1 A:

It depends on your regexp engine. As explained here (Regular Expression Matching Can Be Simple And Fast) there may be some important difference in the performance depending on the implementation.

Pierre 2008-10-22 10:37:43

+2 A:

You might be interested by articles like: Regular Expression Matching Can Be Simple And Fast or Understanding Regular Expressions.

It is, alas, easy to write inefficient REs, which can match quite quickly on success but can look for hours if no match is found, because the engine stupidly try a long match on every position of a long string!

There are a few recipes for this, like anchoring whenever it is possible, avoiding greediness if possible, etc.

Note that the giant e-mail expression isn't recent, and not necessarily slow: a short, simple expression can be slower than a more convoluted one!

Note also that in some situations (like e-mail, precisely), it can be more efficient (and maintainable!) to use a mix of regexes and code to handle cases, like splitting at @, handling different cases (first part starts with " or not, second part is IP address or domain, etc.).

Regexes are not the ultimate tool able to do everything, but it is a very useful tool well worth to master!

PhiLho 2008-10-22 10:40:31

+1 A:

You can't talk about regexes in general any more than you can talk about code in general.

Regular expressions are little programs on their own. Just as any given program may be fast or slow, any given regex may be fast or slow.

One thing to remember, however, is that the regular expression handler is is very well optimized to do its job and run the regex quickly.

Andy Lester 2008-10-22 21:28:01

ansaurus

tags:

views:

answers:

does regex comparisons consume lots of resources?

related questions