views:

52

answers:

3

I have used RegExp before but am far from an expert...

I'm reading up right now for a project, but am running into an issue. I'm using rubular.com to build my regex, and their documentation describes the following:

(...)   Capture everything enclosed
(a|b)   a or b

How can I use an OR expression without capturing what's in it? So if I want to match "a or b followed by a c", and only capture the c, I can't use

(a|b)(c)

right? Then I capture both the "a or b" as well as the "c". I know I can filter through the captured results, but that seems like more work...

Am I missing something obvious? I'm using this in Java, if that is pertinent.

Thank you for your help!

+3  A: 

Depending on the regular expression implementation you can use so called non-capturing groups with the syntax (?:…):

((?:a|b)c)

Here (?:a|b) is a group but you cannot reference its match. So you can only reference the match of ((?:a|b)c) that is either ac or bc.

Gumbo
@Gumbo that did it! Thanks for the super fast response. I will accept after the time limit (which I didn't know existed) expires.
goggin13
I thought the idea was not to capture the `a` or `b` at all. In other words, to *match* `ac` or `bc`, but only *capture* the `c`: `(?:a|b)(c)`
Alan Moore
+1  A: 

If your implementation has it, then you can use non-capturing parentheses:

(?:a|b)
@mmutz Thanks for the fast response! I wish I could accept both answers, that was just what I was looking for
goggin13
+1  A: 

Even rubular doesn't make you use parentheses and the precedence of | is low. For example a|bc does not match ccc

msw
@msw what does the '!~' operator do? I like your expression, with fewer parens, regex is messy enough already
goggin13
!~ is a perlism for "does not match", it was sloppy writing on my part; fixed, thanks.
msw
gotcha; thanks for the link!
goggin13
I don't get you. The low precedence of `|` is why you *do* have to use parens. `(?:a|b)c` matches `ac` or `bc` (the desired behavior), while `a|bc` matches `a` or `bc`.
Alan Moore