Regex is not a good solution here.
Validate if a UTF8 string is an integer:
try:
int(val)
is_int = True
except ValueError:
is_int = False
Validate if a UTF8 string is a float: same as above, but with float()
.
Validate if a UTF8 string is of length(1-255):
is_of_appropriate_length = 1 <= len(val) <= 255
Validate if a UTF8 string is a valid date: this is not trivial. If you know the right format, you can use time.strptime()
like this:
# Validate that the date is in the YYYY-MM-DD format.
import time
try:
time.strptime(val, '%Y-%m-%d')
is_in_valid_format= True
except ValueError:
is_in_valid_format = False
EDIT: Another thing to note. Since you specifically mention UTF-8 strings, it would make sense to decode them into Unicode first. This would be done by:
my_unicode_string = my_utf8_string.decode('utf8')
It is interesting to note that when trying to convert a Unicode string to an integer using int()
, for example, you are not limited to the "Western Arabic" numerals used in most of the world. int(u'١٧')
and int(u'१७')
will correctly decode as 17 even though they are Hindu-Arabic and Devangari numerals respectively.