views:

140

answers:

4

Python n00b here

I need to convert an arbitrary string to a string that is a valid variable name in python.

Here's a very basic example:

s1 = 'name/with/slashes'
s2 = 'name '

def clean(s):
    s = s.replace('/','')
    s = s.strip()
    return s

print clean(s1)+'_'#the _ is there so I can see the end of the string

That is a very naive approach. I need to check if the string contains invalid variable name characters and replace them with ''

What would be the neat python way to do this ?

+4  A: 

You should build a regex that's a whitelist of permissible characters and replace everything that is not in that character class.

Daenyth
Or better yet, build a whitelist of permissible _strings_ and link them to actual pre-defined objects. Every time questions like this come up, I get nervous and think someone is almost certainly about to write a massive security hole.
Nicholas Knight
:) not me...I'm just parsing a scene graph from cinema 4d and need to re create it in blender. Animators usually don't worry or need to worry about spaces/slashes/etc in object names
George Profenza
+1  A: 

Use the re module, and strip all invalid charecters.

Zonda333
+5  A: 

According to Python, an identifier is a letter or underscore, followed by an unlimited string of letters, numbers, and underscores:

import re

def clean(s):

   # Remove invalid characters
   s = re.sub('[^0-9a-zA-Z_]', '', s)

   # Remove leading characters until we find a letter or underscore
   s = re.sub('^[^a-zA-Z_]+', '', s)

   return s

Use like this:

>>> clean(' 32v2 g #Gmw845h$W b53wi ')
'v2gGmw845hWb53wi'
Triptych
+1 for recognizing that valid characters are different for the first letter vs subsequent letters as well as only allowing valid letters vs stripping invalid letters.
R Samuel Klatchko
But reserved words can't be used as variable names...
Jukka Suomela
@JukkaSuomela has a great point. Since all the reserved keywords are only letters, you can guarantee there won't be a conflict by adding an underscore (if you want to be real good, you can first check if the name is a keyword and only adding an underscore if that's the case).
R Samuel Klatchko
You can use the "keyword" module to make sure the name does not conflict with any Python keywords, "keyword.iskeyword(s)"
flashk
+1  A: 

Well, I'd like to best Triptych's solution with ... a one-liner!

>>> clean = lambda varStr: re.sub('\W|^(?=\d)','_', varStr)

>>> clean('32v2 g #Gmw845h$W b53wi ')
'_32v2_g__Gmw845h_W_b53wi_'

This substitution replaces any non-variable appropriate character with underscore and inserts underscore in front if the string starts with a digit. IMO, 'name/with/slashes' looks better as variable name name_with_slashes than as namewithslashes.

Nas Banov