tags:

views:

138

answers:

1

I'm a regex newbie and need a single expression that:

matches the "an" and the "AN" but not the "and" or "AND" and matches the "o" and the "O" but not the "or" or "OR" in this predicate:

1and(2or3)AND(4OR5)an(6o7)AN(8O9)

Basically I can't figure out how to convert the expression:

var myRegEx = Regex("[0-9 ()]|AND|OR")

into a "everything but", case insensitive expression.

Can't use the regex word boundaries feature because the predicate isn't required to have spaces.

(Added after two answers were already provided): I also need to know the index of the match, which is why I'm assuming I need to use the Regex.Match() method.

Thanks!

Here's what I ended up with:

  private bool mValidateCharacters()
  {
     const string legalsPattern = @"[\d ()]|AND|OR";
     const string splitPattern = "(" + legalsPattern + ")";
     int position = 0;
     string[] tokens = Regex.Split(txtTemplate.Text, splitPattern, RegexOptions.IgnoreCase);

     // Array contains every legal operator/symbol found in the entry field
     // and every substring preceeding, surrounded by, or following those operators/symbols
     foreach (string token in tokens)
     {
        if (string.IsNullOrEmpty(token))
        {
           continue;
        }

        // Determine if the token is a legal operator/symbol or a syntax error
        Match match = Regex.Match(token, legalsPattern, RegexOptions.IgnoreCase);

        if (string.IsNullOrEmpty(match.ToString()))
        {
           const string reminder =
              "Please use only the following in the template:" +
              "\n\tRow numbers from the terms table" +
              "\n\tSpaces" +
              "\n\tThese characters: ( )" +
              "\n\tThese words: AND OR";
           UserMsg.Tell("Illegal template entry '" + token + "'at position: " + position + "\n\n" + reminder, UserMsg.EMsgType.Error);
           txtTemplate.Focus();
           txtTemplate.Select(position, token.Length);
           return false;
        }

        position += token.Length;
     }

     return true;
  }
+6  A: 

Randal Schwartz's rule: Use capturing in Regex.Match when you know what you want to keep, and use Regex.Split when you know what you want to throw away.

You wrote you want “everything but,” so

var input = "1and(2or3)AND(4OR5)an(6o7)AN(8O9)";
foreach (var s in Regex.Split(input, @"[\d()]|AND|OR", RegexOptions.IgnoreCase))
  if (s.Length > 0)
    Console.WriteLine("[{0}]", s);

Output:

[an]
[o]
[AN]
[O]

To get the offsets, save the separators by enclosing the regular expression in parentheses:

var input = "1and(2or3)AND(4OR5)an(6o7)AN(8O9)";
string pattern = @"([\d()]|AND|OR)";
int offset = 0;
foreach (var s in Regex.Split(input, pattern, RegexOptions.IgnoreCase)) {
  if (s.ToLower() == "an" || s.ToLower() == "o")
    Console.WriteLine("Found [{0}] at offset {1}", s, offset);
  offset += s.Length;
}

Output:

Found [an] at offset 19
Found [o] at offset 23
Found [AN] at offset 26
Found [O] at offset 30
Greg Bacon
Awesome rule...initial testing looking good...
Jim C
Sorry, I should have been more explicit in the stated requirements, because when I said "a single expression that matches..." I used the term "matches" because I need the index of the point of interest (e.g. usage of "AN") and Regex.Match() provides that (that is, it would if I could figure out how to structure the expression!). If no one can provide a solution that includes index, I'll accept your answer.
Jim C
@Jim See updated answer.
Greg Bacon
Thanks gbacon! For anyone interested, I've posted the method I was writing above.
Jim C