views:

72

answers:

3

Sometimes when I get input from a file or the user, I get a string with escape sequences in it. I would like to process the escape sequences in the same way that Python processes escape sequences in string literals.

For example, let's say myString is defined as:

>>> myString = "spam\\neggs"
>>> print(myString)
spam\neggs

I want a function (I'll call it process) that does this:

>>> print(process(myString))
spam
eggs

It's important that the function can process all of the escape sequences in Python (listed in a table in the link above).

Does Python have a function to do this?

A: 

If you trust the source of the data, just slap quotes around it and eval() it?

>>> myString = 'spam\\neggs'
>>> print eval('"' + myString.replace('"','') + '"')
spam
eggs

PS. added evil-code-exec counter-measure - now it will strip all " before eval-ing

Nas Banov
There's a better solution than the general purpose `eval()`, see my answer.
Greg Hewgill
There's a better solution than using the ast module, see my answer.
Jerub
@Greg Hewgill: out of curiosity, can you think of any risk after disposing of quotes, as in my patched example? mind your ast also has problem with if there are quotes in the string that "match" the string-bracketing ones
Nas Banov
@Nas Banov: Your example will still throw an error if `myString` ends in a backslash. Not a severe problem, but probably undesired.
Greg Hewgill
@Greg Hewgill: won't `ast.literal_eval()` do the same? (i dont have python 2.6 to check). to me raising exception on malformed string is ok, "string injection" exploit is what i am concerned about
Nas Banov
@Nas Banov: Yes, `literal_eval()` has the same problem. The best solution is the `string-escape` decode function, but I think our other answers are still useful for future readers!
Greg Hewgill
+2  A: 

The ast.literal_eval function comes close, but it will expect the string to be properly quoted first.

Of course Python's interpretation of backslash escapes depends on how the string is quoted ("" vs r"" vs u"", triple quotes, etc) so you may want to wrap the user input in suitable quotes and pass to literal_eval. Wrapping it in quotes will also prevent literal_eval from returning a number, tuple, dictionary, etc.

Things still might get tricky if the user types unquoted quotes of the type you intend to wrap around the string.

Greg Hewgill
I see. This seems to be potentially dangerous as you say: `myString = "\"\ndoBadStuff()\n\""`, `print(ast.literal_eval('"' + myString + '"'))` seems to try to run code. How is `ast.literal_eval` any different/safer than `eval`?
dln385
@dln385: `literal_eval` never executes code. From the documentation, "This can be used for safely evaluating strings containing Python expressions from untrusted sources without the need to parse the values oneself."
Greg Hewgill
requires Python 2.6+ ?
Nas Banov
+5  A: 

The correct thing to do is use the 'string-escape' code to decode the string.

>>> myString = "spam\\neggs"
>>> decoded_string = bytes(myString, "utf-8").decode("unicode_escape") # python3 
>>> decoded_string = myString.decode('string_escape') # python2
>>> print(decoded_string)
spam
eggs

Don't use the AST or eval. Using the string codecs is much safer.

Jerub
hands down, the **best** solution! btw, by docs it should be "string_escape" (with underscore) but for some reason accepts anything in the pattern 'string escape', 'string@escape" and whatnot... basically `'string\W+escape'`
Nas Banov
@Nas Banov The documentation does [make a small mention about that](http://docs.python.org/library/codecs.html#standard-encodings): `Notice that spelling alternatives that only differ in case or use a hyphen instead of an underscore are also valid aliases; therefore, e.g. 'utf-8' is a valid alias for the 'utf_8' codec.`
dln385
In Python 3, the command needs to be `print(bytes(myString, "utf-8").decode("unicode_escape"))`
dln385