views:

2316

answers:

3

I'm working with a MySQL database that has some data imported from Excel. The data contains non-ascii characters (em dashes, etc.) as well as hidden carriage returns or line feeds. Is there a way using MySQL to find these records?

+4  A: 

It depends exactly what you're defining as "ascii", but I would suggest trying a variant of a query like this:

SELECT * FROM tableName WHERE NOT columnToCheck REGEXP '[A-Za-z0-9]';

That query will return all rows where columnToCheck contains any non-alphanumeric characters. If you have other characters that are acceptable, add them to the character class in the regular expression. For example, if periods, commas, and hyphens are ok, change the query to:

SELECT * FROM tableName WHERE NOT columnToCheck REGEXP '[A-Za-z0-9.,-]';

The most relevant page of the MySQL documentation is probably here:
http://dev.mysql.com/doc/refman/5.1/en/regexp.html

Chad Birch
Thanks - I will take a look at that. I don't have much experience with Regular Expressions in SQL, so this will be a good opportunity to learn.
Ed Mays
Shouldn't you escape the hyphen and period? (Since they do have special meanings in a regular expression.) SELECT * FROM tableName WHERE NOT columnToCheck REGEXP '[A-Za-z0-9\.,\-]';
Tooony
The "NOT" should be in front on the "REGEXP". This only worked for me when the "NOT" was at that place.
Silence
+1  A: 

This is probably what you're looking for:

select * from TABLE where COLUMN regexp '[^ -~]';

It should return all rows where COLUMN contains non-ascii characters (or non-printable ascii characters such as newline).

David Minor
+3  A: 

You can define ASCII as all characters that have a decimal value of 0 - 127 (0x00 - 0x0F) and find columns with non-ASCII characters using the following query

SELECT * FROM TABLE WHERE NOT HEX(COLUMN) REGEXP '^([0-7][0-9A-F])*$';

This was the most comprehensive query I could come up with.