A blacklist for characters is likely pretty large :-)
You can use the regular expression
^[\d\p{L}]+$
to match decimal digits and letters, regardless of script.
This regular expression consists of a character class containing the shorthands \d
– which contains every digit (230 in total in the BMP) and \p{L}
which contains every Unicode character classified as a "letter" (46817 in the BMP). Said character class is then repeated at least once and embedded between ^
and $
– the string start and end anchors, so it matches the complete string.
For some regex engines, since you're only interested in Latin letters, apparently, you could also use
^[\d\p{Letter}]+$
However, .NET doesn't support this. The first regex mentioned above actually catches everything that's a digit or a letter in any script. So it will dutifully match on Indian or Arabic numerals and Hebrew, Cyrillic and other non-Latin scripts. Depending on what you want this may not be appropriate.
If that poses a problem, then I see no better option than to explicitly list the characters you want to allow. However, I consider it dangerous to assume that text in a certain language is always restricted to that language's script. If I were to write a Czech or Polish name in a German text, then I'd likely need more than just [a-zA-ZäöüÄÖÜß]
.