tags:

views:

758

answers:

8

I'm under the impression that the Dot '.' (wild card) character is dangerous to use. Is my fear unfounded? Thanks

+8  A: 

Sodium is dangerous but it is required for life. Dot is like any other tool, only as dangerous as you make it. I would hate to try to write 99% of my regexes without it.

EBGreen
+12  A: 

It isn't dangerous, as long as you understand what it means. It generally will match any character of the input text. Depending on the flavor of regular expressions, it may or may not match end-of-line characters.

Avi
+8  A: 

The only tricky part I see for '.' is when matching multi-line string: with the wrong options, it can match much more than needed, and it can introduce bacjtracking issue (due to a non-greedy match).

From regex tutorial

The dot matches a single character, without caring what that character is. The only exception are newline characters.

In most regex flavors, the dot will not match a newline character by default. So by default, the dot is short for the negated character class [^\n] (UNIX regex flavors) or [^\r\n] (Windows regex flavors).

This exception exists mostly because of historic reasons. The first tools that used regular expressions were line-based. They would read a file line by line, and apply the regular expression separately to each line. The effect is that with these tools, the string could never contain newlines, so the dot could never match them.

VonC
+5  A: 

I wouldn't say "dangerous", at least not in general. However:

  • .* should be avoided where possible, because it can kill your regex's performance with lots of backtracking as it tries to find the best match and, if the token which comes after it appears more than once in the input, you probably won't get the match you wanted, because it's looking for the longest possible match. .*? helps with the backtracking issue and eliminates the "too long match" problem, but not using . at all tends to be more effective.

  • Because . can match anything (except, usually, an end-of-line), it may match something you didn't intend/expect. In a security-conscious context, this can be dangerous.

Dave Sherohman
A: 

It depends on the usage. .* is great when searching for files, for example. It can be bad if you have a regex like this:

.*<one>.*<two>.*<three>.*</three>.*</two>.*</one>.*

For reasons that other people stated, depending on what's between those brackets, this can cause a lot of backtracking and be really slow.

Claudiu
+3  A: 

Don't forget that you can often use [^x]* instead of .*?x. The latter can consume x if necessary to complete a match, but the former cannot. The . is more likely to be dangerous if your regex is allowed to match multi-line strings, with . able to represent a newline. Anyhow, you should really only be concerned when you use .* or .*?, though there are plenty of cases where you'll want that. .{0,10} and the like are less prone to causing your regex to start running absurdly slow.

Brian
+2  A: 

The dot isn't inherently dangerous, but people do tend to rely on it too heavily. In fact, it occurred to me awhile back that a good way to improve your regex skills would be to stop using the dot--or at least, try to use it as little as possible. This will force you to think about how regex matching works, and to explore those other, more advanced features that you've never gotten around to learning.

As with many other tools, it's easy to get stuck at a middling level of regex-fu and never really master them. This strikes me as a good way to drag yourself over that hump. Note that I'm not saying you should never use the dot again. just give it a rest for a few months while you find out what else regexes have to offer.

Alan Moore
+3  A: 

VonC beated me to pointing out my article. The section "use the dot sparingly" answers your question.

The problem isn't with the dot. The problem is with people using it in situations where it isn't appropriate.

Jan Goyvaerts
Whoa... Mr. "Regex Master" ;) 9.99 out of ten, I always rush to your site for any regexp question. Thank you for helping us having a better understanding of those tricky expressions.
VonC
Indeed. I appreciate all of your work you've done in helping to explain the complexities of RegEx. Keep up the good work.