views:

290

answers:

3

I want to do the following with python:

  1. Validate if a UTF8 string is an integer.
  2. Validate if a UTF8 string is a float.
  3. Validate if a UTF8 string is of length(1-255).
  4. Validate if a UTF8 string is a valid date.

I'm totally new to python and I believe this should be done with regular expression, except maybe for the last one. Your help is appreciated!

+1  A: 
  1. int() and check for exceptions
  2. float() - but what do you mean float?
  3. int() and then check using if
  4. using datetime formatting
bluszcz
+2  A: 

Why use regex? I'm convinced it would be slower and more cumbersome.

The int() and float() method or better yet the isdigit() method work well here.

a = "03523"
a.isdigit()
>>> True

b = "963spam"
b.isdigit()
>>> False

For question 3, do you mean "Validate if a UTF8 string is a NUMBER of length(1-255)"?

Why not:

def validnumber(n):
  try:
    if 255 > int(n) > 1:
      return True
  except ValueError:
      return False
Dominic Bou-Samra
+5  A: 

Regex is not a good solution here.

  1. Validate if a UTF8 string is an integer:

    try:
      int(val)
      is_int = True
    except ValueError:
      is_int = False
    
  2. Validate if a UTF8 string is a float: same as above, but with float().

  3. Validate if a UTF8 string is of length(1-255):

    is_of_appropriate_length = 1 <= len(val) <= 255
    
  4. Validate if a UTF8 string is a valid date: this is not trivial. If you know the right format, you can use time.strptime() like this:

    # Validate that the date is in the YYYY-MM-DD format.
    import time
    try:
      time.strptime(val, '%Y-%m-%d')
      is_in_valid_format= True
    except ValueError:
      is_in_valid_format = False
    

EDIT: Another thing to note. Since you specifically mention UTF-8 strings, it would make sense to decode them into Unicode first. This would be done by:

my_unicode_string = my_utf8_string.decode('utf8')

It is interesting to note that when trying to convert a Unicode string to an integer using int(), for example, you are not limited to the "Western Arabic" numerals used in most of the world. int(u'١٧') and int(u'१७') will correctly decode as 17 even though they are Hindu-Arabic and Devangari numerals respectively.

Max Shawabkeh