tags:

views:

97

answers:

2

I am constructing a regular expression that will contain a number of groups in a logical OR relationship ...

(group A)|(group B)|(group C)

This expression already successfully matches its intended sub-strings, but I need to add a final group that will match anything else, with the result that the target string will be entirely consumed by matches.

Can anyone please suggest what this missing group expression should be?

Edit: Linefeeds are not an issue, so I assume we can use the simpler of the proposed syntaxes. However, I am clearly doing something really dumb, as I have simplified my expression such that it only contains (.*?) and it refuses to capture anything - just a bunch of empty matches. (I am using Rad Regex Designer for testing, but I have also tried another Regex tool and the result is the same).

Edit: As suggested, here is an example input. This is a simple Excel formula, but it could be any Excel formula:

SUM(A5:D9)+AVERAGE(F5:I5)-LOOKUP(L5, N5:N14)

The group expressions described above are designed to extract cell references (e.g. L5) or cell range references (e.g. F5:I5). So, when the expression is executed against the sample input, it must produce the following 9 captures:

  1. SUM(
  2. A5:D9
  3. )+AVERAGE(
  4. F5:I5
  5. )-LOOKUP(
  6. L5
  7. ,
  8. N5:N14
  9. )
+4  A: 

(group A)|(group B)|(group C)|([\s\S]*?) Should do the trick. [\s\S] is just a hack to get around the fact . doesn't match linefeeds.

Matti Virkkunen
Shouldn't that be ...|([\s\S]*?) to actually capture the rest?
C. Ross
Thanks for the tricks! Good to learn!
Dr. Xray
You can also add `RegexOptions.SingleLine` as the final parameter to most `Match`ing calls, to avoid the linefeed issue.
Mike
I think this will always match to full string (even empty) because of `[\s\S]*` - matches everything
Ivan Nevostruev
I don't think that's what OP wanted; he wants to match a substring, then match the rest of the remaining string (I think).
Loadmaster
I'm not sure what the problem is, but this captures an empty string for each individual character in the "anything else" category (although it still matches the original groups OK). I have tried with various combinations of options. Puzzled :(
Tim Coulter
+2  A: 

Adding non-capturing group (?:...) for correct | behaviour. Adding (.*) group for the rest of the line:

(?:(group A)|(group B)|(group C))(.*)

Or if you don't need separate groups:

(group A|group B|group C)(.*)
Ivan Nevostruev
+1 for the trick.
Dr. Xray
Thanks, but it's not working. This matches the whole of the target string in a single group.
Tim Coulter
@Tim Coulter: can you add example of input and expecting results. This will help to solve the problem
Ivan Nevostruev