I have a Perl regex. But I'm not sure what "?" means in this context.
m#(?:\w+)#
What does ?
mean here?
I have a Perl regex. But I'm not sure what "?" means in this context.
m#(?:\w+)#
What does ?
mean here?
In this case, the ?
is actually being used in connection with the :
. Put together, ?:
at the beginning of a grouping means to group but not capture the text/pattern within the parentheses (as in, it will not be stored in any backreferences like \1
or $1
, so you will not be able to access the grouped text directly).
More specifically, a ?
has three distinct meanings in regex:
The ?
quantifier signifies "zero or one repetitions" of an expression. One of the canonical examples I've seen is s?he
which will match both she
and he
since the ?
makes the s
"optional"
When a quantifier (+
, *
, ?
, or the general {n,m}
) is followed by a ?
then the match is non-greedy (i.e. it will match the shortest string starting from that position that allows the match to proceed)
A ?
at the beginning of a parenthesized group signifies that you want to perform a special action. As in this case, :
means to group but not capture. The exact list of actions available will vary somewhat from one regex engine to another, but here's a list (not necessarily all-inclusive) of some of them:
A. Non-capturing group: (?:text)
B. Lookaround: (?=a)
for a lookahead, ?!
for negative lookahead, or ?<=
and ?<!
for lookbehinds (positive and negative, respectively).
C. Conditional Matches: (?(condition)then|else)
.
D. Atomic Grouping: a(?>bc|b)c
(matches abcc
but not abc
; see the link)
E. Inline enabling/disabling of regex matching modifiers: ?i
to enable a mode, ?-i
to disable. You can also enable/disable more than one modifier at a time by simply concatenating them, such as ?im
(i
is case insensitive and m
is multiline).
F. Named capture groups: (?P<name>pattern)
, which can later be referenced using (?P=name)
. The .NET regex engine uses the syntax (?<name>pattern)
instead.
G. Comments: (?#Comment text)
. I personally think this just adds clutter, but I guess it could serve some use...free-spacing mode might be a better option (the (?x)
modifier).
So essentially, the purpose of the ?
is just contextual. If you wanted zero or more repetitions of a literal (
character you'd have to use \(?
to escape the paren.
Those are non-capturing parentheses. They're used for grouping (just like normal parentheses) but the group won't be added to the capture array (i.e. it won't be referenceable with \n).
See the regex tutorial that is installed with every version of Perl (in particular, this section).
(?:...)
Groups subexpressions without capturing (cluster)
You can also use YAPE::Regex::Explain:
C:\\Temp> perl -MYAPE::Regex::Explain -e \ "print YAPE::Regex::Explain->new(qr#(?:\w+)#)->explain" The regular expression: (?-imsx:(?:\w+)) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- (?: group, but do not capture: ---------------------------------------------------------------------- \w+ word characters (a-z, A-Z, 0-9, _) (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
In short, the sequence (?
starts a regular expression special feature. The things that follow the (?
specify the special feature, in this case, a non-capturing grouping. We cover this in both Intermediate Perl and Effective Perl Programming. The perlre documents Perl regular expressions.