tags:

views:

103

answers:

2

How would I write a regular expression (C#) which will check a given string to see if any of its characters are characters OTHER than the following:

a-z
A-Z
Æ æ Å å Ø ø - '

+12  A: 
new Regex("[^a-zA-ZÆæÅ娸'-]")

The [] creates a character class, then ^ specifies negation, so a character matches the class if it's not one of those listed.

Matthew Flaschen
The `'` should be excluded too.
KennyTM
Thanks, @Kenny.
Matthew Flaschen
Note that the - character is the last in the list. If you want more characters excluded don't add them after the -, put them before (or escape the - with backslash), otherwise the characters to the left and right of the - will be treated as a character range.
David_001
you can actually do "[^a-zA-ZÆ-Åæ-å'-]" to know that requires knowledge of the Danish alphabet of course :p
Rune FS
@Rune, not sure how serious you're being. That will actually throw an exception. The Unicode code point order does not match the Danish alphabet. Æ and Å are right next to each other in Unicode, but not in that order. That also means you're leaving out Ø and ø, which are later.
Matthew Flaschen
The order should be æ,ø,å that's the alphabetic order of those letters and that's the exact order in the ascii table (the Danish ASCII table).
Rune FS
@Rune, Unicode, the basis of .NET regex, starts with ISO-8859-1, which doesn't match DS 2089 (the Danish variant of ASCII).
Matthew Flaschen
@matthew guess I learned something today then (I expected it to use the local culture, to me doing anything else is asking for troubles but then again that's the reason why your (in this case mine) code should never rely on a given culture)
Rune FS
+1  A: 

Hi,

You can use character grouping in combination with the negation operator to achieve this.

You also need to escape the - character (and potentially the ') using a \

Your final expression would read:

[^a-zA-ZÆæÅ娸\-\']*

Martin Eve