tags:

views:

181

answers:

5

I have a Perl regex. But I'm not sure what "?" means in this context.

m#(?:\w+)#

What does ? mean here?

+15  A: 

In this case, the ? is actually being used in connection with the :. Put together, ?: at the beginning of a grouping means to group but not capture the text/pattern within the parentheses (as in, it will not be stored in any backreferences like \1 or $1, so you will not be able to access the grouped text directly).

More specifically, a ? has three distinct meanings in regex:

  1. The ? quantifier signifies "zero or one repetitions" of an expression. One of the canonical examples I've seen is s?he which will match both she and he since the ? makes the s "optional"

  2. When a quantifier (+, *, ?, or the general {n,m}) is followed by a ? then the match is non-greedy (i.e. it will match the shortest string starting from that position that allows the match to proceed)

  3. A ? at the beginning of a parenthesized group signifies that you want to perform a special action. As in this case, : means to group but not capture. The exact list of actions available will vary somewhat from one regex engine to another, but here's a list (not necessarily all-inclusive) of some of them:

    A. Non-capturing group: (?:text)
    B. Lookaround: (?=a) for a lookahead, ?! for negative lookahead, or ?<= and ?<! for lookbehinds (positive and negative, respectively).
    C. Conditional Matches: (?(condition)then|else).
    D. Atomic Grouping: a(?>bc|b)c (matches abcc but not abc; see the link)
    E. Inline enabling/disabling of regex matching modifiers: ?i to enable a mode, ?-i to disable. You can also enable/disable more than one modifier at a time by simply concatenating them, such as ?im (i is case insensitive and m is multiline).
    F. Named capture groups: (?P<name>pattern), which can later be referenced using (?P=name). The .NET regex engine uses the syntax (?<name>pattern) instead.
    G. Comments: (?#Comment text). I personally think this just adds clutter, but I guess it could serve some use...free-spacing mode might be a better option (the (?x) modifier).

So essentially, the purpose of the ? is just contextual. If you wanted zero or more repetitions of a literal ( character you'd have to use \(? to escape the paren.

eldarerathis
@Chas: Don't know why I omitted that. Good catch/addition.
eldarerathis
For point #3, there's also `(?>...)`, which is an [atomic group](http://www.regular-expressions.info/atomic.html) in flavours that support it, and `(?i)` and `(?-i)` for inline enabling/disabling of [modifiers](http://www.regular-expressions.info/modifiers.html).
Daniel Vandersluis
@Daniel: Thanks. I think I'm going to clean up #3 and add a list with some links, so that then other people can continue to add to it as well.
eldarerathis
@eldarerathis Just for clarity, `(?im)` enables two modes (case insensitive and multiline) ;)
Daniel Vandersluis
@Daniel Vandersluis: Right, edited to make that clearer in the answer. I could see how that was not evident in my original phrasing. I think this is a bit better :)
eldarerathis
+2  A: 

Those are non-capturing parentheses. They're used for grouping (just like normal parentheses) but the group won't be added to the capture array (i.e. it won't be referenceable with \n).

See here: http://www.regular-expressions.info/refadv.html

Alin Purcaru
+1  A: 

See the regex tutorial that is installed with every version of Perl (in particular, this section).

davorg
+7  A: 

$ perldoc perlreref:

(?:...) Groups subexpressions without capturing (cluster)

You can also use YAPE::Regex::Explain:

C:\\Temp> perl -MYAPE::Regex::Explain -e \ 
"print YAPE::Regex::Explain->new(qr#(?:\w+)#)->explain"

The regular expression:

(?-imsx:(?:\w+))

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  (?:                      group, but do not capture:
----------------------------------------------------------------------
    \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                             more times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of grouping
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
Sinan Ünür
A: 

In short, the sequence (? starts a regular expression special feature. The things that follow the (? specify the special feature, in this case, a non-capturing grouping. We cover this in both Intermediate Perl and Effective Perl Programming. The perlre documents Perl regular expressions.

brian d foy