tags:

views:

101

answers:

3

Hi,

We need to identify and then process switch/case statements in code based on some business rules.

A typical switch statement:

switch (a)
{
case "A":
case "B":
result = "T";
result1 = "F";
default: result = "F";
}

I have been able to create two patterns to match the switch body in the first step and the case labels and body in the second step, however I am looking for a single regex which will allow me to extract case labels and bodies.

We do not have nested switches.

Kind regards,

+3  A: 

Since switch statements can be nested traditional regexes can't handle them (heck, even the fact that the {} can be nested breaks them). Regexes can only parse Regular Languages. You need some form of parser to parse languages that are not regular. Depending on what language you have (it looks like C, but so do a lot of things), there may already be a parser you can use (such as Sparse for C).

Chas. Owens
Hi, We don't have nested switches.
SharePoint Newbie
+1  A: 

Here is something to start with but it is far from perfect - the expression does not recognize defaults and the end of the switch statement (and maybe includes some other errors).

(?sn:(case (?<label>[^:]+):[ \r\n\t]*)+(?<body>((?!case).)*))

UPDATE

It will also fail if the body contains case as part of a string or an identifier.

Daniel Brückner
A: 

Chas. Owens is correct in his comment. But for simple cases, you can maybe use the following regex:

switch\s*\((?<expression>[^\)]+)\)\s*\{\s*((default(?<case>)|case\s*(?<case>"[^"]*"|'[^']*'|\w+))\s*:\s*(?<body>((?!\s*(case\b|default\b|\}))("[^"]*"|'[^']*'|[^\}]))*)\s*)+\}

To use it, the regex engine you're using should support explicit multiple named captures and look-aheads (such as the .NET regex engine). Note that all groups except for the named groups can be made non-capturing, but in order to make the regex easier to understand I didn't add the "?:" at the group start to make them non-capturing groups.

You'll then get one match for each recognized switch statement with the following captures:

  • expression: the expression used for the switch (1 capture)

  • case: the case label, or empty (but a successful capture) for the default

  • body: the case body, one for each case

case and body will always come as pairs, so that you can enumerate through the captures.

Lucero