views:

1006

answers:

3

I'm learning regex and need some help to get all possible matches for a pattern out of a string.

If my input is:

case a
when cond1 
then stmt1;
when cond2 
then stmt2;
end case;

I need to get the matches which have groups as follows

Group1:

  1. "cond1"
  2. "stmt1;"

and Group2:

  1. "cond2"
  2. "stmt2;"

Is it possible to get such groups using any regex?

+1  A: 

It's possible to use regex for this provided that you don't nest your statements. For example if your stmt1 is another case statment then all bets are off (you can't use regex for something like that, you need a regular parser).

Edit: If you really want to try it you can do it with something like (not tested, but you get the idea):

Regex t = new Regex(@"when\s+(.*?)\s+then\s+(.*?;)", RegexOptions.Singleline)
allMatches = t.Matches(input_string)

But as I said this will work only for not nested statements.

Edit 2: Changed a little the regex to include the semicolon in the last group. This will not work as you wanted - instead it will give you multiple matches and each match will represent one when condition, with the first group the condition and the second group the statement.

I don't think you can build a regex that does exactly what you want, but this should be close enough (I hope).

Edit 3: New regex - should handle multiple statements

Regex t = new Regex(@"when\s+(.*?)\s+then\s+(.*?)(?=(when|end))", RegexOptions.Singleline)

It contains a positive lookahead so that the second group matches from then to the next 'when' or 'end'. In my test it worked with this:

case a
when cond1 
then stmt1;
   stm1;
   stm2;stm3
when cond2 
then stmt2;
   aaa;  
   bbb;
end case;

It's case sensitive for now, so if you need case insensitivity you need to add the corresponding regex flag.

rslite
yes, thats right, but for that i can check for the next occerence of case and take out the string before it to apply a pattern on it and can get all possible matches.So, can you please help in forming the regex?
Archie
well, i tried this regex, but it is not working.also, as per the case expression in pl/sql there can be multiple statements after then.
Archie
Edit 3 resulted in two matches with three groupings (cond1 stmt1; when) in each match.
Will
Yes, I needed to add a third group for the lookahead, but that can be safely ignored. The first two groups in each match contain the meat.
rslite
A: 

If this was written in java I would write two patterns for the parser, one to match the cases and one to match the when-then cases. Here is how the latter could be written:

CharSequence buffer = inputString.subSequence(0, inputString.length());
// inputString is the string you get after matching the case statements...

Pattern pattern = Pattern.compile(
 "when (\\S+).*"
 + "then (\\S+).*");

Matcher matcher = pattern.matcher(buffer);
while (matcher.find()) {
 DoWhenThen(matcher.group(1), matcher.group(2));
}

Note: I haven't tested this code as I'm not 100% sure on the pattern... but I'd be tinkering around this.

Spoike
thanks a lot, but i have to implement it in C#.
Archie
+1  A: 

I don't think this is possible, primarily because any group that matches when...then... is going to match all of them, creating multiple captures within the same group.

I'd suggest using this regex:

(?:when(.*)\nthen(.*)\n)+?

which results in:

Match 1:
* Group 1: cond1
* Group 2: stmt1;
Match 2:
* Group 1: cond2
* Group 2: stmt2;

Will
Thanks a lot. but this regex is only working with when on newline.So tried modifying it as(?:when(.*)\s+then(.*)\s*)+?but still it is not working.
Archie
Hmm, I copied your example text and tested against that. Maybe your actual data is different? I don't have any regex options set (no SingleLine, no MultiLine). Doesn't 'when' start on a newline?
Will
not necessarily. i dont think it gives any syntax error if "when" doesn't start on newline.
Archie