I'm working with a MySQL database that has some data imported from Excel. The data contains non-ascii characters (em dashes, etc.) as well as hidden carriage returns or line feeds. Is there a way using MySQL to find these records?
It depends exactly what you're defining as "ascii", but I would suggest trying a variant of a query like this:
SELECT * FROM tableName WHERE NOT columnToCheck REGEXP '[A-Za-z0-9]';
That query will return all rows where columnToCheck contains any non-alphanumeric characters. If you have other characters that are acceptable, add them to the character class in the regular expression. For example, if periods, commas, and hyphens are ok, change the query to:
SELECT * FROM tableName WHERE NOT columnToCheck REGEXP '[A-Za-z0-9.,-]';
The most relevant page of the MySQL documentation is probably here:
http://dev.mysql.com/doc/refman/5.1/en/regexp.html
This is probably what you're looking for:
select * from TABLE where COLUMN regexp '[^ -~]';
It should return all rows where COLUMN contains non-ascii characters (or non-printable ascii characters such as newline).
You can define ASCII as all characters that have a decimal value of 0 - 127 (0x00 - 0x0F) and find columns with non-ASCII characters using the following query
SELECT * FROM TABLE WHERE NOT HEX(COLUMN) REGEXP '^([0-7][0-9A-F])*$';
This was the most comprehensive query I could come up with.