views:

42

answers:

2

Looking for some black magic that will match any string with "weird" characters in it. Standard ASCII characters are fine. Everything else isn't.

This is for sanitizing various web forms.

+2  A: 

This gets anything out of the ASCII range

[^\x00-\x7F]

There are still some "weird" characters like x00 (NULL), but they are valid ASCII.
For reference, see the ASCII table

NullUserException
That "ASCII table" page is crap (pardon my French). It presents that second chart as "the most popular" of the "extended ASCII sets"--come again? It's Cp850! Nobody uses that on purpose; it just happens to be the default encoding of the Windows command line. Also, the tables are images, and they look like hell (pardon my Italian) on an LCD display. Send them to Wikipedia instead: http://en.wikipedia.org/wiki/ASCII
Alan Moore
+1  A: 

[^\p{IsBasicLatin}] for what is asked for, [^\x00-\x7F] for concision over self-documentation, or \p{C} for clearing out formatters and controls without hurting other non-ASCIIs (and with greater concision yet).

Jon Hanna