tags:

views:

154

answers:

4

What does the plus symbol in regex mean?

+2  A: 

One or more occurences of the preceding symbols.

E.g. a+ means the letter a one or more times. Thus, a matches a, aa, aaaaaa but not an empty string.

If you know what the asterisk (*) means, then you can express (exp)+ as (exp)(exp)*, where (exp) is any regular expression.

phimuemue
+1  A: 

In most implementations + means "one or more".

In some theoretical writings + is used to mean "or" (most implementations use the | symbol for that).

sepp2k
+1  A: 

1 or more of previous expression.

[0-9]+

Would match:

1234567890

In:

I have 1234567890 dollars

Chris
+13  A: 

+ can actually have two meanings, depending on context.

Like the other answers mentioned, + usually is a repetition operator, and causes the preceding token to repeat one or more times. a+ would be expressed as aa* in formal language theory, and could also be expressed as a{1,} (match a minimum of 1 times and a maximum of infinite times).


However, + can also make other quantifiers possessive if it follows a repetition operator (ie ?+, *+, ++ or {m,n}+). A possessive quantifier is an advanced feature of some regex flavours (PCRE, Java and the JGsoft engine) which tells the engine not to backtrack once a match has been made.

To understand how this works, we need to understand two concepts of regex engines: greediness and backtracking. Greediness means that in general regexes will try to consume as many characters as they can. Let's say our pattern is .* (the dot is a special construct in regexes which means any character1; the star means match zero or more times), and your target is aaaaaaaab. The entire string will be consumed, because the entire string is the longest match that satisfies the pattern.

However, let's say we change the pattern to .*b. Now, when the regex engine tries to match against aaaaaaaab, the .* will again consume the entire string. However, since the engine will have reached the end of the string and the pattern is not yet satisfied (the .* consumed everything but the pattern still has to match b afterwards), it will backtrack, one character at a time, and try to match b. The first backtrack will make the .* consume aaaaaaaa, and then b can consume b, and the pattern succeeds.

Possessive quantifiers are also greedy, but as mentioned, once they return a match, the engine can no longer backtrack past that point. So if we change our pattern to .*+b (match any character zero or more times, possessively, followed by a b), and try to match aaaaaaaab, again the .* will consume the whole string, but then since it is possessive, backtracking information is discarded, and the b cannot be matched so the pattern fails.


1 In most engines, the dot will not match a newline character, unless the /s ("singleline" or "dotall") modifier is specified.

Daniel Vandersluis
+1; possessive quantifiers only work in Java, PCRE, or the JGSoft regex engine, though. Ruby, Perl, and .NET use atomic groups `(?>.*)`.
Tim Pietzcker
@Tim I alluded to that, but I've now made it more explicit in my answer.
Daniel Vandersluis
@Tim: Perl does support possesive quantifiers, probably since 5.10
ninjalj
@ninjalj: Thanks for the info. It appears that [this comparison](http://www.regular-expressions.info/refflavors.html) is not up to date anymore, then.
Tim Pietzcker