If your regex flavor supports Unicode properties, this is probably the best the best way:
\P{Cc}
That matches any character that's not a control character, whether it be ASCII -- [\x00-\x1F\x7F]
-- or Latin1 -- [\x80-\x9F]
(also known as the C1 control characters).
The problem with POSIX classes like [:print:]
or \p{Print}
is that they can match different things depending on the regex flavor and, possibly, the locale settings of the underlying platform. In Java, they're strictly ASCII-oriented. That means \p{Print}
matches only the ASCII printing characters -- [\x20-\x7E]
-- while \P{Cntrl}
(note the capital 'P') matches everything that's not an ASCII control character -- [^\x00-\x1F\x7F]
. That is, it matches any ASCII character that isn't a control character, or any non-ASCII character--including C1 control characters.