ansaurus

Question

Any python/django function to check whether a string only contains characters included in my database collation?

Answer 1

A:

You can use a regular expression to only allow certain characters. The following allows only letters, numbers and _(underscore), but you can change to include whatever you want:

import re

exp = '^[A-Za-z0-9_]+$'
re.match(exp, my_string)

If an object is returned a match is found, if no return value, invalid string.

Luiz C. 2010-02-26 21:34:01

There is no function implementing the regular expression matching the "utf8_general_ci" set? I must do it manually?Thanks anyway

jul 2010-02-26 22:14:13

Not that I know of.

Luiz C. 2010-02-27 00:10:26

Answer 2

A:

I'd look at Python's unicode.translate() and codec.encode() functions. Both of these would allow more elegant handling of non-legal input characters, and IIRC, translate() has been shown to be faster than a regexp for similar use-cases (should be easy to google the findings).

From Python's docs:

"For Unicode objects, the translate() method does not accept the optional deletechars argument. Instead, it returns a copy of the s where all characters have been mapped through the given translation table which must be a mapping of Unicode ordinals to Unicode ordinals, Unicode strings or None. Unmapped characters are left untouched. Characters mapped to None are deleted. Note, a more flexible approach is to create a custom character mapping codec using the codecs module (see encodings.cp1251 for an example)."

http://docs.python.org/library/stdtypes.html

http://docs.python.org/library/codecs.html

pithyless 2010-03-01 21:04:03

ansaurus

tags:

views:

answers:

Any python/django function to check whether a string only contains characters included in my database collation?

related questions