views:

73

answers:

1

I have an sqlite db that has some crazy ascii characters in it and I would like to remove them, but I have no idea how to go about doing it. I googled some stuff and found some people saying to use REGEXP with mysql, but that threw an error saying REGEXP wasn't recognized.

Here is the error I get:

sqlalchemy.exc.OperationalError: (OperationalError) Could not decode to UTF-8 column 'table_name' with text ...

Thanks for the help

A: 

Well, if you really want to shoehorn a rich unicode string into a plain ascii string (and don't mind some goofs), you could use this:

import unicodedata as ud
def shoehorn_unicode_into_ascii(s):
    # This removes accents, but also other things, like ß‘’“”
    return ud.normalize('NFKD', s).encode('ascii','ignore')

For a more complete solution (with somewhat fewer goofs, but requiring a third-party module unidecode), see this answer.

Really, though, the best solution is to work with unicode data throughout your code as much as possible, and drop to an encoding only when necessary.

unutbu
I was actually looking for an sql statement I could use then I don't have to deal with it. I actually already tried this function: def removeNonAscii(s): return "".join(i for i in s if ord(i)<128) and I still get the error because it's happening when I do the select in SQLAlchemy I think.
bababa
@bababa: Can you draw out the text with a simple SQL SELECT statement like `SELECT * from table` ? If so, it would not be hard to write a one-off script to read in the data, and use `UPDATE`s to write back ascii data... Very low-brow, but expedient.
unutbu
hmmm, that is a good idea, let me try it out.
bababa