views:

89

answers:

6

Python 3 has a string method called str.isidentifier

How can I get similar functionality in Python 2.6, short of rewriting my own regex, etc.?

+1  A: 
re.match(r'[a-z_]\w*$', s, re.I)

should do nicely. As far as I know there isn't any built-in method.

SilentGhost
+1  A: 

In Python < 3.0 this is quite easy, as you can't have unicode characters in identifiers. That should do the work:

import re
import keyword

def isidentifier(s):
    if s in keyword.kwlist:
        return False
    return re.match(r'^[a-z_][a-z0-9_]*$', s, re.I) is not None
Łukasz
`[a-z_]` for first char.
bobince
Good, but identifiers can start with an underscore.
Jason R. Coombs
Yeah, I was thinking about that when I was writing this.
Łukasz
+2  A: 

the tokenize module defines a regexp called Name

import re, tokenize
re.match(tokenize.Name,somestr)
gnibbler
For these purposes, you need '^'+tokenize.Name+'$'.
Jason R. Coombs
Add in check for python reserved words, and I like this.
Douglas S. J. De Couto
A: 

Good answers so far. I'd write it like this.

import keyword
import re

def isidentifier(candidate):
    "Is the candidate string an identifier in Python 2.x"
    is_not_keyword = candidate not in keyword.kwlist
    pattern = re.compile(r'^[a-z_][a-z0-9_]*$', re.I)
    matches_pattern = bool(pattern.match(candidate))
    return is_not_keyword and matches_pattern
Jason R. Coombs
Need to allow for capital letters. Above answers have similar problem.
Douglas S. J. De Couto
@Douglas: that's what `re.I` flag is for.
SilentGhost
A: 

What I am using:

def is_valid_keyword_arg(k):
    """
    Return True if the string k can be used as the name of a valid
    Python keyword argument, otherwise return False.
    """
    # Don't allow python reserved words as arg names
    if k in keyword.kwlist:
        return False
    return re.match('^' + tokenize.Name + '$', k) is not None
Douglas S. J. De Couto
`re.match` matches from the beginning of the line.
SilentGhost
A: 

I've decided to take another crack at this, since there have been several good suggestions. I'll try to consolidate them. The following can be saved as a Python module and run directly from the command-line. If run, it tests the function, so is provably correct (at least to the extent that the documentation demonstrates the capability).

import keyword
import re
import tokenize

def isidentifier(candidate):
    """
    Is the candidate string an identifier in Python 2.x
    Return true if candidate is an identifier.
    Return false if candidate is a string, but not an identifier.
    Raises TypeError when candidate is not a string.

    >>> isidentifier('foo')
    True

    >>> isidentifier('print')
    False

    >>> isidentifier('Print')
    True

    >>> isidentifier(u'Unicode_type_ok')
    True

    # unicode symbols are not allowed, though.
    >>> isidentifier(u'Unicode_content_\u00a9')
    False

    >>> isidentifier('not')
    False

    >>> isidentifier('re')
    True

    >>> isidentifier(object)
    Traceback (most recent call last):
    ...
    TypeError: expected string or buffer
    """
    # test if candidate is a keyword
    is_not_keyword = candidate not in keyword.kwlist
    # create a pattern based on tokenize.Name
    pattern_text = '^{tokenize.Name}$'.format(**globals())
    # compile the pattern
    pattern = re.compile(pattern_text)
    # test whether the pattern matches
    matches_pattern = bool(pattern.match(candidate))
    # return true only if the candidate is not a keyword and the pattern matches
    return is_not_keyword and matches_pattern

def test():
    import unittest
    import doctest
    suite = unittest.TestSuite()
    suite.addTest(doctest.DocTestSuite())
    runner = unittest.TextTestRunner()
    runner.run(suite)

if __name__ == '__main__':
    test()
Jason R. Coombs