tags:

views:

87

answers:

6

An input string:

string datar = "aag, afg, agg, arg";

I am trying to get matches: "aag" and "arg", but following won't work:

string regr = "a[a-z&&[^fg]]g";
string regr = "a[a-z[^fg]]g";

What is the correct way of ignoring regex matches in C#?

A: 

It seems like you're trying to match any three alphabetic characters, with the condition that the second character cannot be f or g. If this is the case, why not use the following regular expression:

string regr = "a[a-eh-z]g";
Donut
+3  A: 

The obvious way is to use a[a-eh-z]g, but you could also try with a negative lookbehind like this :

string regr = "a[a-z](?<!f|g)g"

Explanation :

  • a Match the character "a"
  • [a-z] Match a single character in the range between "a" and "z"
  • (?<!XXX) Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind)
    • f|g Match the character "f" or match the character "g"
  • g Match the character "g"
madgnome
+2  A: 

Try this if you want match arg and aag:

a[ar]g

If you want to match everything except afg and agg, you need this regex:

a[^fg]g
atamanroman
perfect answer for the question
knittl
+3  A: 

Character classes aren't quite that fancy. The simple solution is:

a[a-eh-z]g

If you really want to explicitly list out the letters that don't belong, you could try something like:

a[^\W\d_A-Zfg]g

This character class matches everything except:

  1. \W excludes non-word characters, i.e. punctuation, whitespace, and other special characters. What's left are letters, digits, and the underscore _.
  2. \d removes digits so now we have letters and the underscore _.
  3. _ removes the underscore so now we only match letters.
  4. A-Z removes uppercase letters so now we only match lowercase letters.
  5. Finally at this point we can list the individual lowercase letters we don't want to match.

All in all way more complicated than we'd likely ever want. That's regular expressions for ya!

John Kugelman
A: 

Regex: a[a-eh-z]g. Then use Regex.Matches to get the matched substrings.

VladV
+3  A: 

What you're using is Java's set intersection syntax:

a[a-z&&[^fg]]g

..meaning the intersection of the two sets ('a' THROUGH 'z') and (ANYTHING EXCEPT 'f' OR 'g'). No other regex flavor that I know of uses that notation. The .NET flavor uses the simpler set subtraction syntax:

a[a-z-[fg]]g

...that is, the set ('a' THROUGH 'z') minus the set ('f', 'g').

Java demo:

String s = "aag, afg, agg, arg, a%g";

Matcher m = Pattern.compile("a[a-z&&[^fg]]g").matcher(s);
while (m.find())
{
  System.out.println(m.group());
}

C# demo:

string s = @"aag, afg, agg, arg, a%g";

foreach (Match m in Regex.Matches(s, @"a[a-z-[fg]]g"))
{
  Console.WriteLine(m.Value);
}

Output of both is

aag
arg
Alan Moore
that works? o_O
knittl
As a remark, in problem i really tried to solve, the data string is actually ~2m unique words containing wide range of characters not only a-z, and regex is built dynamically based on specific set of rules. And originally written in Java.
Margus
Holy cow! I had no idea about this feature! I love it. It should be noted, though, that this was added in .NET 2.0.
P Daddy