tags:

views:

38

answers:

2

Is there any way to have an expression in brackets not be caught in a group?

E.g. i have an expression something like this:

(A(B|C)?) D (E(F|G)?)

Take note of the optional blocks (B|C)? and (F|G)? needing brackets.
I'm not interested in what was caught in these groups. All i want is to catch the full first and last block.

But because of the optional blocks, the group numbering will change and i can't tell if (E(F|G)?) was caught as group 2 or 3.

Can i tell the expression to ignore the optional parts in the result groups, so the group numbering will stay the same? Or can i make optional catches always appear in groups - even when they're null?

+5  A: 

There are non-capturing groups (?:…):

(A(?:B|C)?) D (E(?:F|G)?)

The match of such a group can not be referenced.

Gumbo
+1  A: 

(E(F|G)?) will always be caught as group 3. The numbering is determined by the order of opening parentheses in the pattern string, which is:

(A(B|C)?) D (E(F|G)?)
^ ^         ^ ^
1 2         3 4

If (B|C) does not occur in the input string then group(2) will return null, but the subsequent groups will not be renumbered.

The only groups that do not influence numbering are non-capturing groups, e.g.

(A(?:B|C)?) D (E(?:F|G)?)
^             ^
1             2

Example:

Pattern pattern = Pattern.compile("(A(B|C)?) D (E(F|G)?)");
Matcher matcher = pattern.matcher("A D EG");
if (matcher.matches()) {
    System.err.println(matcher.group(1));
    System.err.println(matcher.group(2));
    System.err.println(matcher.group(3));
    System.err.println(matcher.group(4));
}

Output:

A
null
EG
G
finnw
You're right. Apparently i was misled by my eclipse plug-in QuickREx, which only showed 3 groups (but with index 1, 3 and 4). But the non-capturing bit still comes in very handy to reduce noise.
Stroboskop