ansaurus

Question

Answer 1

+5 A:

Python does support unicode in regular expressions if you specify the re.UNICODE flag. You can probably use something like this:

r'^[^\W_]+(-[^\W_]+)?$'

Test code:

# -*- coding: utf-8 -*-
import re

names = [
            u'Björn',
            u'Anne-Charlotte',
            u'توماس',
            u'毛',
            u'מיק',
            u'-Björn',
            u'Anne--Charlotte',
            u'Tom_',
        ]

for name in names:
    regex = re.compile(r'^[^\W_]+(-[^\W_]+)?$', re.U)
    print u'{0:20} {1}'.format(name, regex.match(name) is not None)

Result:

Björn                True
Anne-Charlotte       True
توماس                True
毛                    True
מיק                  True
-Björn               False
Anne--Charlotte      False
Tom_                 False

If you also want to disallow digits in names then change [^\W_] to [^\W\d_] in both places.

Mark Byers 2010-09-28 19:45:39

You might want to add a space to the allowed characters though.

poke 2010-09-28 19:57:31

Modified to `^[^\W0-9_]+([ \-'‧][^\W0-9_]+)*?$`, to support the most names. Will be tested as extensively as possible. Thanks a lot =)

Pierre 2010-09-28 20:47:15

@Pierre: Use `\Z`, not `$`, otherwise "Fred\n" will be regarded as valid. Perhaps you are assuming that the input has already been sanitised to the extent of stripping leading and trailing whitespace and replacing all internal runs of whitespace by a single space. `\d` as suggested by Mark is NOT the same as `0-9` ... is your change deliberate?

John Machin 2010-09-29 01:27:36

ansaurus

tags:

views:

answers:

Validate a name in Python

related questions