tags:

views:

86

answers:

1

Let's say I have a string that can contain any UTF-16 characters, but I want to replace all characters not in a whitelist with an underscore. Let's say the whitelist is [A-Za-z], [0-9], and [-:.].

How would I use the Regex class to replace all characters not in the whitelist?

+3  A: 

You can do it with this:

[^A-Za-z0-9:.-]

The caret is the negation operator. So this will match every character that's not in the character class.

And then you simply replace the matches with an underscore like this:

Regex myRegex = new Regex(@"[^A-Za-z0-9:.-]", RegexOptions.Multiline);
return myRegex.Replace("your target string here", "_");

Here it is in action.

Steve Wortham
+1, but please ditch the backslashes: `@"[^A-Za-z0-9:.-]"`
Greg Bacon
OK, done. I guess I got nervous with dots and dashes in the character classes and I escaped them unnecessarily. But it looks like it works exactly the same without the slashes.
Steve Wortham