tags:

views:

159

answers:

6

What is the difference between encasing part of a regular expression in () (parentheses) and doing it in [] (square brackets)?

How does this:

[a-z0-9]

differ from this:

(a-z0-9)

?

+8  A: 

(…) is a group that groups the contents like in math; (a-z0-9) is the grouped sequence of a-z0-9. Groups are particularly used with quantifiers that allow the preceding expression to be repeated as a whole: a*b* matches any number of a’s followed by any number of b’s, e.g. a, aaab, bbbbb, etc.; in contrast to that, (ab)* matches any number of ab’s, e.g. ab, abababab, etc.

[…] is a character class that describes the options for one single character; [a-z0-9] describes one single character that can be of the range az or 09.

Gumbo
+1 for links to www.regular-expressions.info Good resource for regex questions.
Jeff Rupert
+2  A: 

The [] construct in a regex is essentially shorthand for an | on all of the contents. For example [abc] matches a, b or c. Additionally the - character has special meaning inside of a []. It provides a range construct. The regex [a-z] will match any letter a through z.

The () construct is a grouping construct establishing a precedence order (it also has impact on accessing matched substrings but that's a bit more of an advanced topic). The regex (abc) will match the string "abc".

JaredPar
+6  A: 

[] denotes a character class. () denotes a capturing group.

[a-z0-9] -- One character that is in the range of a-z OR 0-9

(a-z0-9) -- Explicit capture of a-z0-9. No ranges.

a -- Can be captured by [a-z0-9].

a-z0-9 -- Can be captured by (a-z0-9) and then can be referenced in a replacement and/or later in the expression.

Jeff Rupert
+2  A: 

[a-z0-9] will match any lowercase letter or number. (a-z0-9) will match the exact string "a-z0-9" and allows two additional things: You can apply modifiers like * and ? and + to the whole group, and you can reference this match after the match with $1 or \1. Not useful with your example, though.

Matt Kane
A: 

[a-z0-9] will match one of abcdefghijklmnopqrstuvwxyz0123456789. In other words, square brackets match exactly one character.

(a-z0-9) will match two characters, the first is one of abcdefghijklmnopqrstuvwxyz, the second is one of 0123456789, just as if the parenthesis weren't there. The () will allow you to read exactly which characters were matched. Parenthesis are also useful for OR'ing two expressions with the bar | character. For example, (a-z|0-9) will match one character -- any of the lowercase alpha or digit.

levis501
Ranges only exist within character classes. `(a-z|0-9)` will match the string `a-z` or `0-9`, but not `a` or `5`.
Daniel Vandersluis
Ah, thanks for that clarification. ([a-z]|[0-9]) would be more of what I described.
levis501
A: 

Try ([a-z0-9]) to capture a mixed string of lowercase letters and numbers, as well as capture for back references (or extraction).

burkestar