views:

515

answers:

8

I read a string from the console. How do I make sure it only contains English characters and digits?

+12  A: 

Assuming that by "English characters" you are simply referring to the 26-character Latin alphabet, this would be an area where I would use regular expressions: ^[a-zA-Z0-9 ]*$

For example:

if( Regex.IsMatch(Console.ReadLine(), "^[a-zA-Z0-9]*$") )
{ /* your code */ }

The benefit of regular expressions in this case is that all you really care about is whether or net a string matches a pattern - this is one where regular expressions work wonderfully. It clearly captures your intent, and it's easy to extend if you definition of "English characters" expands beyond just the 26 alphabetic ones.

There's a decent series of articles here that teach more about regular expressions.

Jørn Schou-Rode's answer provides a great explanation of how the regular expression presented here works to match your input.

LBushkin
You may want to include white space as well; "a b c" returns false for the given regex pattern.
Fredrik Mörk
... and punctuation
Joe
And what about fancy punctuation characters like the ellipsis character (…) or curly apostrophes? These have their own unicode characters.
Will Vousden
+6  A: 

You could match it against this regular expression: ^[a-zA-Z0-9]*$

  • ^ matches the start of the string (ie no characters are allowed before this point)
  • [a-zA-Z0-9] matches any letter from a-z in lower or upper case, as well as digits 0-9
  • * lets the previous match repeat zero or more times
  • $ matches the end of the string (ie no characters are allowed after this point)

To use the expression in a C# program, you will need to import System.Text.RegularExpressions and do something like this in your code:

bool match = Regex.IsMatch(input, "^[a-zA-Z0-9]*$");

If you are going to test a lot of lines against the pattern, you might want to compile the expression:

Regex pattern = new Regex("^[a-zA-Z0-9]*$", RegexOptions.Compiled);

for (int i = 0; i < 1000; i++)
{
    string input = Console.ReadLine();
    pattern.IsMatch(input);
}
Jørn Schou-Rode
Note that this pattern will return false if the input string contains a space.
Fredrik Mörk
A: 
bool AllAscii(string str)
{ 
   return !str.Any(c => !Char.IsLetterOrDigit(c));
}
James Curran
Nice for determining if a string has an invalid character...
dboarman
IsLetterOrDigit will be true for any Unicode letter. Not only for English. Am I correct?
Andrey Shvydky
A: 

Something like this (if you want to control input):

static string ReadLettersAndDigits() {
    StringBuilder sb = new StringBuilder();
    ConsoleKeyInfo keyInfo;
    while ((keyInfo = Console.ReadKey(true)).Key != ConsoleKey.Enter) {
        char c = char.ToLower(keyInfo.KeyChar);
        if (('a' <= c && c <= 'z') || char.IsDigit(c)) {
            sb.Append(keyInfo.KeyChar);
            Console.Write(c);
        }
    }
    return sb.ToString();
}
Andrey Shvydky
+5  A: 

One gotcha here is "non-English" characters in valid English words, (e.g. façade, although this is commonly spelled without the cedilla as facade). I guess it depends on what the intent of the OP is in recognising these additional characters...

ZombieSheep
A: 

do you have web access? i would assume that cannot be guaranteed, but Google has a language api that will detect the language you pass to it. google language api

PurplePilot
A: 

If i dont wnat to use RegEx, and just to provide an alternate solution, you can just check the ASCII code of each character and if it lies between that range, it would either be a english letter or a number (This might not be the best solution):

foreach (char ch in str.ToCharArray()) 
{ 
    int x = (int)char;
    if (x >= 63 and x <= 126) 
    {
       //this is english letter, i.e.- A, B, C, a, b, c...
    }
    else if(x >= 48 and x <= 57)
    {
       //this is number
    }
    else
    {
       //this is something diffrent
    }

} 

http://en.wikipedia.org/wiki/ASCII for full ASCII table.

But I still think, RegEx is the best solution.

Bhaskar
A: 

I agree with the Regular Expression answers. However, you could simplify it to just "^[\w]+$". \w is any "word character" (which translates to [a-zA-Z_0-9] if you use a non-unicode alphabet. I don't know if you want underscores as well.

More on regexes in .net here: http://msdn.microsoft.com/en-us/library/ms972966.aspx#regexnet_topic8

Erik A. Brandstadmoen