tags:

views:

38

answers:

2

I have a text which essentially a c-style source file. I need to match a specific character, lets say ':' only when it's outside of a string. Example:

void main() {
    int x = rand() % 2;
    printf(x ? "heads : tails" : "tails : heads");
    // I want to match this ---^ character, but not others
}

For specificity sake I'm using .NET style of regular expressions

A: 

You can do this with balancing groups, a depth tracking feature of .net regular expressions.

Brent Arias
I suppose you could also use the string Split method, and only search the odd entries of the array of strings it returns. The even entries are string literals you are trying to avoid.
Brent Arias
Strings cannot be nested, so there's no need to track depth.
Jan Goyvaerts
A: 

You can do this with a regular expression such as (:)|"[^"\r\n]*" that matches either a colon, or a string. Use a capturing group to determine whether the colon was matched or not. Iterate over the matches of this regex to process the colons.

Regex regexObj = new Regex("(:)|\"[^\"\r\n]*\"");
Match matchResults = regexObj.Match(subjectString);
while (matchResults.Success) {
    if (matchResult.Groups[1].Success) {
        // Colon was matched
    }
    matchResults = matchResults.NextMatch();
}

Note that while this regex works correctly on your code sample, it won't work on C# code in general. The regex doesn't handle strings that contain escaped quotes, doesn't handle verbatim strings, and doesn't exclude colons from comments. If you want all that you'll need to expand the regex using the same principle, e.g:

(:)|string|verbatim string|single line comment|multi line comment
Jan Goyvaerts