I'm using the following regex to strip out non-printing control characters from user input before inserting the values into the database.
preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $value)
Is there a problem with using this on utf-8 strings? It seems to remove all non-ascii characters entirely.