tags:

views:

45

answers:

4

Folks,

I have method that returns true if all characters are "legal" and false if a single character is "illegal". The definition is below (legal = letters, numbers, and some characters like $, - , %, etc). I want a newline and/or carriage return character to be "illegal". However, the method below thinks that its legal. How can I fix-up?

  private static bool NameHasAllLegalCharacters( string name )
    {
        var regexAlphaNum = new Regex( @"[^a-zA-Z0-9#\$%\^&\*\(\)\._\-\+=\[\]/<>{}:\s]" );
        return !regexAlphaNum.IsMatch( name );
    }
+2  A: 

\s matches all whitespace, not just the ASCII space.

Ignacio Vazquez-Abrams
ah! what do I use for regular-old-space? as in what's between these quotes " "?
SFun28
@SFun28 putting a regular space instead of \s does it.
Yuriy Faktorovich
@Yuiry Faktorovich - works like a charm! thanks!
SFun28
+1  A: 

The \s class matches all whitespace, and \r and \n are classed as whitespace.

Your regex will need to be more specific about which types of whitespace are allowed.

LukeH
+1  A: 

Given that there're thousands of characters out there (thanks to Unicode) I would not even try to use a blacklist approach. Instead, reverse your logic and test whether all chars belong to your whitelist:

private static bool NameHasAllLegalCharacters( string name )
{
    var regexAlphaNum = new Regex( @"^[a-zA-Z0-9#$%........]*$" );
    return regexAlphaNum.IsMatch( name );
}
Álvaro G. Vicario
According to DeMorgan's Law, this is exactly equivalent to what is already there.
Ignacio Vazquez-Abrams
great suggestion! I'm gonna change my code.
SFun28
I think there's value in thinking about a large space in terms of what's included instead of what's not-included. But you are right...they are equivalent
SFun28
@Ignacio Vazquez-Abrams, you're right. I overlooked the double negation, which effectively discards unknown characters.
Álvaro G. Vicario
+1  A: 

\s matches this character set: [\f\n\r\t\v\x85\p{Z}]. You could simply enumerate this without \n and \r, like this:

@"[^a-zA-Z0-9#\$%\^&\*\(\)\._\-\+=\[\]/<>{}:\f\t\v\x85\p{Z}]"

I know it's ugly, but it should work.

Martinho Fernandes
Very cool! didn't know that. what's p{Z}? I found the rest. none of them seem to be just a space character - i.e. " " (inbetween quotes)
SFun28
@SFun28: Very few Unicode character classes match a single character.
Ignacio Vazquez-Abrams
`\p{xxxx}` matches the Unicode named character class xxxx. Class `Z` is whitespace. It includes all kinds of crazy whitespace characters you can find in Unicode like "paragraph separators" and stuff.
Martinho Fernandes