views:

52

answers:

3

Is there a way to search for multiple nested if statements in code using a regular expression?

For example, an expression that would locate an instance of if statements three or more layers deep with different styles (if, if/else, if/elseif/else):

if (...) {
    <code>
    if (...) {
        <code>
        if (...)
            <code>
    } else if (...) {
        <code>
    } else {
        <code>
    }
} else {
    <code>
}
A: 

Try:

((if\(.+\)(\n)?.*\n|(else)?[ ]*(if\(.+\))?(\{)?(\n)*.*(\n)*(\})?){3}((if\(.+\)(\n)?.*\n|(else)?[ ]*if\(.+\)\{(\n)*.*(\n)*\})*

A bit verbose, but it looks for 3 or more statements consisting of an if statement with a condition and optional braces, or an else if statement with an optional condition and optional braces.

Hawkcannon
As usual, regexes won't work. If you encounter a string or a comment or a variables containing the sequence "if", this will fail. You can't parse programming languages with regexes.
Ira Baxter
+3  A: 

Using regexes to do source code searches is a bad idea. IMO. It is better to use some tool that parses the source code and then allows you to query the parse trees using (for example) XPath style path expressions.

The problem with regexes for source code searching is that they are generally too hard to read and write (unless you are a regex Guru), and they are prone to false positives and false negatives due to some edge case that the regex creator didn't think of. (For example, using \uxxxx characters in keywords.)

Here are some tool links:

(Please feel free to suggest others.)

Stephen C
Can you suggest an open-source tool(s)? Preferably language-generic. Also, why is it a bad idea?
TERACytE
@TERACytE - I believe that PMD is a good place to start looking.
Stephen C
PMD appears to only support Java. Yasca seems to handle multiple languages via plug-ins, but you can only create custom plug-ins using regex.
TERACytE
@TERACytE - to me, that simply says that you should be using a language-specific search tool ... for each language.
Stephen C
+1  A: 

Unless, I misread this the answer is definitively no. The reason is that if you have to keep track of the nesting level you are talking about a language subset that cannot be matched be a regular expression. Regular expressions can only recognize things that are captured in a deterministic finite automaton. To do something like this requires a stack or a counter which moves you up to a more powerful class of automata called a push-down automaton.

Ukko