Is there an easy way to match all punctuation except period and underscore, in a C# regex? Hoping to do it without enumerating every single punctuation mark.
You could possibly use a negated character class like this:
[^0-9A-Za-z._\s]
This includes every character except those listed. You may need to exclude more characters (such as control characters), depending on your ultimate requirements.
Here is something a little simpler. Not words or white-space (where words include A-Za-z0-9 AND underscore).
[^\w\s.]
The answers so far do not respect ALL punctuation. This should work:
(?![\._])\p{P}
(Explanation: Negative lookahead to ensure that neither .
nor _
are matched, then match any unicode punctuation character.)
Use Regex Subtraction
[\p{P}-[._]]
Here's the link for .NET Regex documentation (I'm not sure if other flavors support it)... http://msdn.microsoft.com/en-us/library/ms994330.aspx
Here's a C# example
string pattern = @"[\p{P}\p{S}-[._]]"; // added \p{S} to get ^,~ and ` (among others)
string test = @"_""'a:;%^&*~`bc!@#.,?";
MatchCollection mx = Regex.Matches(test, pattern);
foreach (Match m in mx)
{
Console.WriteLine("{0}: {1} {2}", m.Value, m.Index, m.Length);
}
Explanation The pattern is a Character Class Subtraction. It starts with a standard character class like [\p{P}] and then adds a Subtraction Character Class like -[._] which says to remove the . and _. The subtraction is placed inside the [ ] after the standard class guts.