tags:

views:

127

answers:

2

How do I use the inline modifiers instead of RegexOptions.Option?

For example:

Regex MyRegex = new Regex(@"[a-z]+", RegexOptions.IgnoreCase);

How do I rewrite this using the inline character i?

http://msdn.microsoft.com/en-us/library/yd1hzczs.aspx

+7  A: 

You can use inline modifiers as follows:

// case insensitive match
Regex MyRegex = new Regex(@"(?i)[a-z]+");  // case insensitive match

or, inverse the meaning of the modifier by adding a minus-sign:

// case sensitive match
Regex MyRegex = new Regex(@"(?-i)[a-z]+");  // case sensitive match

or, switch them on and off:

// case sensitive, then case-insensitive match
Regex MyRegex = new Regex(@"(?-i)[a-z]+(?i)[k-n]+");

Alternatively, you can use the mode-modifier span syntax using a colon : and a grouping parenthesis, which scopes the modifier to only that group:

// case sensitive, then case-insensitive match
Regex MyRegex = new Regex(@"(?-i:[a-z]+)(?i:[k-n]+)");

You can use multiple modifiers in one go like this (?is-m:text), or after another, if you find that clearer (?i)(?s)(?-m)text (I don't). When you use the on/off switching syntax, be aware that the modifier works till the next switch, or the end of the regex. Conversely, using the mode-modified spans, after the span the default behavior will apply.

Finally: the allowed modifiers in .NET are (use a minus to invert the mode):

x allow whitespace and comments
s single line mode
m multi line mode
i case insensitivity
n only allow explicit capture (.NET specific)

Abel
Thanks. So if I want to use multiple modifiers, I just do `(?imsx)` instead of `(?i)`, for example?
pessimopoppotamus
@pessimopoppatamus: yes, exactly. I'll add that to the post.
Abel
@Abel Dank je =)
pessimopoppotamus
+1 for a good post too :)
Ahmad Mageed
+4  A: 

Use it in this manner:

Regex MyRegex = new Regex(@"(?i:[a-z]+)");

Prefix the inline option to your pattern with (?<option>:<pattern>). In this case the option is "i" for IgnoreCase.

By specifying a colon above you are setting the option to just that pattern. To make the option apply to the entire pattern you may set it in the beginning on its own:

@"(?i)[a-z]+"

It is also possible to use multiple options and turn them on and off:

// On: IgnoreCase, ExplicitCapture. Off: IgnorePatternWhitespace
@"(?in-x)[a-z]+"

This allows for flexibility in a pattern to enable/disable options at different points of a regex that isn't possible when using the RegexOptions on the entire pattern.

Here is a slightly in-depth example. I encourage you to play with it to understand when the options are taking effect.

string input = "H2O (water) is named Dihydrogen Monoxide or Hydrogen Hydroxide. The H represents a hydrogen atom, and O is an Oxide atom.";

// n = explicit captures
// x = ignore pattern whitespace
// -i = remove ignorecase option
string pattern = @"di?(?nx-i) ( hydrogen ) | oxide";
var matches = Regex.Matches(input, pattern, RegexOptions.IgnoreCase);
Console.WriteLine("Total Matches: " + matches.Count);
foreach (Match match in matches)
{
    Console.WriteLine("Match: {0} - Groups: {1}", match.Value, match.Groups[1].Captures.Count);
}

Console.WriteLine();

// n = explicit captures
// x = ignore pattern whitespace
// -i = remove ignorecase option
// -x = remove ignore pattern whitespace
pattern = @"di?(?nx-i) (?<H> hydrogen ) (?-x)|oxide";
matches = Regex.Matches(input, pattern, RegexOptions.IgnoreCase);
Console.WriteLine("Total Matches: " + matches.Count);
foreach (Match match in matches)
{
    Console.WriteLine("Match: {0} - Groups: {1}", match.Value, match.Groups["H"].Captures.Count);
}

The output for the above is:

Total Matches: 3
Match: Dihydrogen - Groups: 0
Match: oxide - Groups: 0
Match: oxide - Groups: 0

Total Matches: 3
Match: Dihydrogen - Groups: 1
Match: oxide - Groups: 0
Match: oxide - Groups: 0

In both patterns RegexOptions.IgnoreCase is used which allows "di" to be case insensitive and thus match "Dihydrogen" (capital D). Since explicit capturing is on, the first example fails to have any groups for ( hydrogen ) since it doesn't use a named group, which is the requirement for explicit capturing. The second pattern does have 1 group since it uses (?<H> hydrogen ).

Next, notice that the second pattern is modified to use (?-x)|oxide at the end. Since IgnorePatternWhitespace is disabled after the hydrogen capture, the remainder of the pattern must be correctly formed by not having additional whitespace (compare with the first pattern) until (?x) is turned on later in the pattern. This serves no real purpose but just shows an in-depth usage of inline options to demonstrate when they actually kick in.

Ahmad Mageed
+1 Good post! (You may want to add a closing parenthesis in your first example, though)
Abel
@Abel thanks, fixed!
Ahmad Mageed