views:

33

answers:

2

hi,

As expected, I get an error when entering some characters not included in my database collation:

(1267, "Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '='")

Is there any function I could use to make sure a string only contains characters existing in my database collation?

thanks

A: 

You can use a regular expression to only allow certain characters. The following allows only letters, numbers and _(underscore), but you can change to include whatever you want:

import re

exp = '^[A-Za-z0-9_]+$'
re.match(exp, my_string)

If an object is returned a match is found, if no return value, invalid string.

Luiz C.
There is no function implementing the regular expression matching the "utf8_general_ci" set? I must do it manually?Thanks anyway
jul
Not that I know of.
Luiz C.
A: 

I'd look at Python's unicode.translate() and codec.encode() functions. Both of these would allow more elegant handling of non-legal input characters, and IIRC, translate() has been shown to be faster than a regexp for similar use-cases (should be easy to google the findings).

From Python's docs:

"For Unicode objects, the translate() method does not accept the optional deletechars argument. Instead, it returns a copy of the s where all characters have been mapped through the given translation table which must be a mapping of Unicode ordinals to Unicode ordinals, Unicode strings or None. Unmapped characters are left untouched. Characters mapped to None are deleted. Note, a more flexible approach is to create a custom character mapping codec using the codecs module (see encodings.cp1251 for an example)."

http://docs.python.org/library/stdtypes.html

http://docs.python.org/library/codecs.html

pithyless