ansaurus

Question

Answer 1

+12 A:

Overhyped? No. They're extremely powerful and flexible.

Overused? Absolutely. Particularly when it comes to parsing HTML (which frequently comes up here).

This is another of those "right tool for the job" scenarios. Some go too far and try to use it for everything.

You are right though in that you can do many things with substring and/or split. You will often reach a point with those where what you're doing will get so complicated that you have to change method or you just end up writing too much fragile code. Regexes are (relatively) easy to expand.

But hand written code will nearly always be faster. A good example of this is Putting char into a java string for each N characters. The regex solution is terser but has some issues that a hand written loop doesn't and is much slower.

cletus 2009-05-08 23:50:05

Or really in any kind of activity that can be called "parsing".

Greg Hewgill 2009-05-08 23:53:17

A compiled (well-written) regex actually tends to be extremely fast. It's just a state-machine. A lot of the speed issues I think can be chalked up to people not understanding that there can be a fairly sizeable penalty for transforming the string representation of a regex to a compiled regex.

2009-05-08 23:58:45

Actually the Perl Regex engine is faster than if you wrote the routine yourself, for all but the simplest of cases. This of course assumes that the Regex was well designed to begin with.

Brad Gilbert 2009-05-09 04:13:58

@cletus: I recovered this answer by merging from a deleted question. You may want to adjust your wording slightly to fit this question.

Bill the Lizard 2009-06-18 18:14:40

Many thanks, Bill.

cletus 2009-06-18 22:08:14

Answer 2

+3 A:

If more people knew how to use a decent parser generator, there would be fewer people using regular expressions.

2009-05-08 23:52:33

Answer 3

+4 A:

Overhyped? No

Under-Utilized Properly? Yes

cpjolicoeur 2009-05-08 23:53:09

I recovered this answer from a deleted question that was the same, but worded slightly differently. You may want to adjust the wording of your answer to match.

Bill the Lizard 2009-06-18 18:15:55

Answer 4

+5 A:

I think that if you learn programming in language that speaks regular expressions natively you'll gravitate toward them because they just solve so many problems. IE, you may never learn to use split because regexec() can solve a wider set of problems and once you get used to it, why look anywhere else?

On the other hand, I bet C and C++ programmers will for the most part look at other options first, since it's not built into the language.

dicroce 2009-05-08 23:54:03

Answer 5

+6 A:

"When you have a hammer, everything looks like a nail."

Regular expressions are a very useful tool; but I agree that they're not necessary for every single place they're used. One positive factor to them is that because they tend to be complex and very heavily used where they are, the algorithms to apply regular expressions tend to be rather well optimized. That said, the overhead involved in learning the regular expressions can be... high. Very high.

Are regular expressions the best tool to use for every applicable situation? Probably not, but on the other hand, if you work with string validation and search all the time, you probably use regular expressions a lot; and once you do, you already have the knowledge necessary to use the tool probably more efficiently and quickly than any other tool. But if you don't have that experience, learning it is effectively a drag on your productivity for that implementation. So I think it depends on the amount of time you're willing to put into learning a new paradigm, and the level of rush involved in your project. Overall, I think regular expressions are very worth learning, but at the same time, that learning process can, frankly, suck.

McWafflestix 2009-05-08 23:55:08

Answer 6

+3 A:

In my belief, they are overused by people quite a bit (I've had this discussion a number of times on SO).

But they are a very useful construct because they deliver a lot of expressive power in a very small piece of code.

You only have to look at an example such as a Western Australian car registration number. The RE would be

re.match("[1-9] [A-Z]{3} [0-9]{3}")

whilst the code to check this would be substantially longer, in either a simple 9-if-statement or slightly better looping version.

I hardly ever use complex REs in my code because:

I know how the RE engines work and I can use domain knowledge to code up faster solutions (that 9-if variant would almost certainly be faster than a one-shot RE compile/execute cycle); and
I find code more readable if it's broken up logically and commented. This isn't easy with most REs (although I have seen one that allows inline comments).

I have seen people suggest the use of REs for extracting a fixed-size substring at a fixed location. Why these people don't just use substring() is beyond me. My personal thought is that they're just trying to show how clever they are (but it rarely works).

paxdiablo 2009-05-09 00:12:04

The substring() example is quite true, I also don't understand why some people insist on using regex's all the time.

Alix Axel 2009-05-09 00:13:41

Answer 7

+1 A:

Regular Expressions are one of the most useful things programmers can learn, they allow to speed up and minimize your code if you know how to handle them.

Alix Axel 2009-05-09 00:12:11

Answer 8

+2 A:

There is a very good reason to use regular expressions in scripting languages (such as Ruby, Python, Perl, JavaScript and Lua): parsing a string with carefully optimized regular expression executes faster than the equivalent custom while loop which scans the string character-by-character. For compiled languages (such as C and C++, and also C# and Java most of the time) usually the opposite is true: the custom while loop executes faster.

One more reason why regular expressions are so popular: they express the programmer's intention in an extremely compact way: a single-line regexp can do as much as a 10- or 20-line while loop.

pts 2009-05-09 00:17:41

Answer 9

+1 A:

Regular expressions are often easier to understand than the non-regex equivalent, especially in a language with native regular expressions, especially in a code section where other things that need to be done with regexes are present.

That doesn't meant they're not overused. The only time string.match(/\?/) is better than string.contains('?') is if it's significantly more readable with the surrounding code, or if you know that .contains is implemented with regexes anyway

singpolyma 2009-05-09 00:17:50

Answer 10

+1 A:

I often use regex in my IDE to quick fix code. Try to do the following without regex.

glVector3f( -1.0f, 1.0f, 1.0f ); -> glVector3f( center.x - 1.0f, center.y + 1.0f, center.z + 1.0f );

Without regex, it's a pain, but WITH regex...

s/glVector3f\((.*?),(.*?),(.*?)\)/glVector3f(point.x+$1,point.y+$2,point.z+$3)/g

Awesome.

Stefan Kendall 2009-05-09 00:41:00

Answer 11

+2 A:

Overhyped? No, if you have ever taken a Parsing or Compiler course, you would understand that this is like saying addition and multiplication is overhyped for math problems.

It is a system for solving parsing problems.

some problems are simpler and don't require regular expressions, some are harder and require better tools.

Unknown 2009-05-09 00:53:23

@Unknown: I recovered this answer by merging from a deleted question. You may want to adjust your wording slightly to fit this question.

Bill the Lizard 2009-06-18 18:17:09

Answer 12

+1 A:

I'd agree that regular expressions are sometimes used inappropriately. Certainly for very simple cases like what you're describing, but also for cases where a more powerful parser is needed.

One consideration is that sometimes you have a condition that needs to do something simple like test for presence of a question mark character. But it's often true that the condition becomes more complex. For example, to find a question mark character that isn't preceded by a space or beginning-of-line, and isn't followed by an alphanumeric character. Or the character may be either a question mark or the Spanish "¿" (which may appear at the start of a word). You get the idea.

If conditions are expected to evolve into something that's less simple to do with a plain call to String.contains("?"), then it could be easier to code it using a very simple regular expression from the start.

Bill Karwin 2009-05-09 01:07:09

Answer 13

+1 A:

It comes down to the right tool for the job.

I usually hear two arguments against regular expressions: 1) They're computationally inefficient, and 2) They're hard to understand.

Honestly, I can't understand how either are legitimate claims.

1) This may be true in an academic sense. A complex expression can double back on itself may times over. Does it really matter though? How many millions of computations a second can a server processor do these days? I've dealt with some crazy expressions, and I've never seen a regexp be the bottle neck. By far it's DB interaction, followed by bandwidth.

2) Hard for about a week. The most complicated regexp is no more complex than HTML - it's just a familiarity problem. If you needed HTML once every 3 months, would you get it 100% each time? Work with them on a daily basis and they're just as clear as any other language syntax.

I write validation software. REGEXP's are second nature. Every fifth line of code has a regexp, and for the life of me I can't understand why people make a big deal about them. I've never seen a regexp slow down processing, and I've seen even the most dull 'programmers' pick up the syntax.

Regexp's are powerful, efficient, and useful. Why avoid them?

rooskie 2009-05-09 01:59:06

Answer 14

+25 A:

Don't avoid them. They're an excellent tool, and when used appropriately can save you a lot of time and effort. Moreover, a good implementation used carefully should not be particularly CPU-intensive.

Shog9 2009-06-15 23:44:08

personally, i like RegEx, saves you a lot of code (and time) when validating text inputs. it might be wiser to sacrifice CPU time for regex than shelling out code (which is bug prone)...

jerbersoft 2009-06-15 23:52:35

Right. If you've spent the last twenty years writing parsers, to where you can now write a flawless "long-hand" equivalent to any regex in minutes (with one arm tied behind your back, while blindfolded...) Then by all means, don't bother with them. But for most of us, writing a regular expression is faster than writing the equivalent parsing code, even if we have to look up the syntax while doing so! And even a moderately complicated expression is easier to understand than two pages of nested switch statements...

Shog9 2009-06-15 23:57:43

@Shog9: Thanks for the heads up on the duplicate that was deleted. I think the wording of that question was its downfall. The answers are definitely worth salvaging, so I merged them in.

Bill the Lizard 2009-06-18 18:19:56

Thanks much, Bill!

Shog9 2009-06-18 18:21:14

Answer 15

+3 A:

Don't avoid it, but ask youself if they're the best tool for the task you have to solve. Maybe sometimes regex are difficult to use or debug, but they're really usefull in some situations. The question is to use the apropiate tool for each task, and usually this is not obvious.

Jonathan 2009-06-15 23:54:56

Answer 16

+15 A:

If you can easily do the same thing with common string operations, then you should avoid using a regular expression.

In most situations regular expressions are used where the same operation would require a substantial amount of common string operations, then there is of course no point in avoiding regular expressions.

Guffa 2009-06-16 00:00:40

Sounds like common sense but people seem to forget this.

xenon 2009-06-16 00:03:52

What is the rationale? Why would a precompiled re on a good compiler be much slower than a string operation?

ilya n. 2009-06-16 00:07:36

"Common sense is not so common" - Voltaire ;)

Guffa 2009-06-16 00:07:51

The cache for compiled regex's is limitied, so the more you have of them, the more often they need recompiling. Even when the regex is already compiled, there is still some overhead.

Guffa 2009-06-16 00:15:42

Answer 17

+8 A:

As a basic parser or validator, use a regular expression unless the parsing or validation code you would otherwise write would be easier to read.

For complex parsers (i.e. recursive descent parsers) use regex only to validate lexical elements, not to find them.

The bottom line is, the best regex engines are well tuned for validation work, and in some cases may be more efficient than the code you yourself could write, and in others your code would perform better. Write your code using handwritten state machines or regex as you see fit, but change from regex to handwritten code if performance tests show you that regex is significantly inefficient.

2009-06-16 00:16:51

+1 for pointing out that regex is often not the right solution for complex parsers

John M Gant 2009-06-16 19:21:08

Answer 18

+5 A:

You know, given the fact that I'm what many people call "young", I've heard too much criticism about RegEx. You know, "he had a problem and tried to use regex, now he has two problems".

Seriously, I don't get it. It is a tool like any other. If you need a simple website with some text, you don't need PHP/ASP.NET/STG44. Still, no discussion on whether any of those should be avoided. How odd.

In my experience, RegEx is probably the most useful tool I've ever encountered as a developer. It's the most useful tool when it comes to #1 security issue: parsing user input. I has saved me hours if not days of coding and creating potentially buggy (read: crappy) code.

With modern CPUs, I don't see what's the performance issue here. I'm quite willing to sacrifice some cycles for some quality and security. (It's not always the case, though, but I think those cases are rare.)

Still, RegEx is very powerful. With great power, comes great responsibility. It doesn't mean you'll use it whenever you can. Only where it's power is worth using.

As someone mentioned above, HTML parsing with RegEx is like a Russian roulette with a fully loaded gun. Don't overdo anything, RegEx included.

dr Hannibal Lecter 2009-06-16 14:53:43

+1 on this post. informative.

jerbersoft 2009-06-16 22:45:27

+1 and amen to that. Of course you don't use regex where a simple string substitution will do, but any programmer who can't get their heads around regex isn't in the right profession, they're not easy, but they are simply *not* *that* hard.

Cruachan 2009-06-17 23:29:45

Answer 19

+9 A:

You can substitute "regex" in your question with pretty much any technology and you'll find people who poorly understand the technology or too lazy to learn the technology making such claims.

There is nothing heavy-weight about regular expressions. The most common way that programmers get themselves into trouble using regular expressions is that they try to do too much with a single regular expression. If you use regular expressions for what they're intended (simple pattern matching), you'll be hard-pressed to write procedural code that's more efficient than the equivalent regular expression. Given decent proficiency with regular expressions, the regular expression takes much less time to write, is easier to read, and can be pasted into tools such as RegexBuddy for visualization.

Jan Goyvaerts 2009-06-16 14:59:14

The people at the other end of the spectrum--equally ignorant, but enthusiastic anyway--don't help matters either. The ones who bug me most are those who respond to string-manipulation questions with the pithy advice, "use regex". Excuse me? If the OP knew anything about regexes, don't you think he would have thought of them on his own? Often as not, regexes are the wrong tool for the job anyway. (I'm not talking about this site, by the way; I mostly see it in Sun's Java forums.)

Alan Moore 2009-06-18 00:03:33

@Alan: Right. Although the regex pushers exist, this site is more of a "have you tried jQuery?" place. Of course, jQuery is a fantastic little library and no one in their right mind should avoid it... but it's not the tool for every job. (Specifically: sometimes you should use regex instead of jQuery)

Shog9 2009-06-18 00:06:49

Answer 20

+4 A:

You should also avoid floating-point numbers at all cost. That is when you're programming in an embedded-environment.

Seriously: if you're in normal software-development you should actually use regex if you need to do something that can't be achieved with simpler string-operations. I'd say that any normal programmer won't be able to implement something that's best done using regexps in a way that is faster than the correspondig regular expression. Once compiled, a regular expression works as a state-maschine that is optimized to near perfection.

marvesmith 2009-06-16 15:08:09

Answer 21

+2 A:

I've seen so many people argue about whether a given regex is correct or not that I'm starting to think the best way to write one is to ask how to do it on StackOverflow and then let the regex gurus fight it out.

I think they're especially useful in JavaScript. JavaScript is transmitted (so should be small) and interpreted from text (although this is changing in the new browsers with V8 and JIT compilation), so a good internal regex engine has a chance to be faster than an algorithm.

I'd say if there is a clear and easy way to do it with string operations, use the string operations. But if you can do a nice regex instead of writing your own state machine interpreter, use the regex.

Nosredna 2009-06-17 23:27:48

++for the JS case, although some of the same logic applies to other interpreted languages as well.

Shog9 2009-06-17 23:48:21

Answer 22

+1 A:

I wouldn't say avoid them entirely, as they are QUITE handy at times. However, it is important to realize the fundamental mechanisms underneath. Depending on your implementation, you could have up to exponential run-time for a search, but since searches are usually bounded by some constant number of backtraces, you can end up with the slowest linear run-time you ever saw.

If you want the best answer, you'll have to examine your particular implementation as well as the data you intend to search on.

From memory, wikipedia has a decent article on regular expressions and the underlying algorithms.

San Jacinto 2009-06-17 23:34:36

ansaurus

tags:

views:

answers:

Should I avoid regular expressions?

related questions