views:

1355

answers:

3

For example, if I have a unicode string, I can encode it as an ASCII string like so:

>>> u'\u003cfoo/\u003e'.encode('ascii')
'<foo/>'

However, I have e.g. this ASCII string:

'\u003foo\u003e'

... that I want to turn into the same ASCII string as in my first example above:

'<foo/>'
A: 

It's a little dangerous depending on where the string is coming from, but how about:

>>> s = '\u003cfoo\u003e'
>>> eval('u"'+s.replace('"', r'\"')+'"').encode('ascii')
'<foo>'
Ned Batchelder
Unfortunately our input is coming from users so this would be too dangerous for us.
John
+8  A: 

It took me a while to figure this one out, but this page had the best answer:

>>> s = '\u003cfoo/\u003e'
>>> s.decode( 'unicode-escape' )
u'<foo/>'
>>> s.decode( 'unicode-escape' ).encode( 'ascii' )
'<foo/>'

There's also a 'raw-unicode-escape' codec to handle the other way to specify Unicode strings -- check the "Unicode Constructors" section of the linked page for more details (since I'm not that Unicode-saavy).

EDIT: See also Python Standard Encodings.

hark
This does exactly what I want. Thanks a bunch!
John
http://www.python.org/doc/2.5.2/lib/standard-encodings.html
Vinko Vrsalovic
A: 

On Python 2.5 the correct encoding is "unicode_escape", not "unicode-escape" (note the underscore).

I'm not sure if the newer version of Python changed the unicode name, but here only worked with the underscore.

Anyway, this is it.

Kaniabi