ansaurus

Question

Python "string_escape" vs "unicode_escape"

Answer 1

+1 A:

Within the range 0 ≤ c < 128, yes the ' is the only difference for CPython 2.6.

>>> set(unichr(c).encode('unicode_escape') for c in range(128)) - set(chr(c).encode('string_escape') for c in range(128))
set(["'"])

Outside of this range the two types are not exchangeable.

>>> '\x80'.encode('string_escape')
'\\x80'
>>> '\x80'.encode('unicode_escape')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can’t decode byte 0x80 in position 0: ordinal not in range(128)

>>> u'1'.encode('unicode_escape')
'1'
>>> u'1'.encode('string_escape')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: escape_encode() argument 1 must be str, not unicode

On Python 3.x, the string_escape encoding no longer exists, since str can only store Unicode.

KennyTM 2010-06-03 19:32:13

That is just because '\x80' is not a valid ascii encoded string. Try `u'\x80'.encode('unicode-escape')` and you get `'\\x80'`

Mike Boers 2010-06-03 19:58:28

@Mike: But is your `my_string` a `str` or a `unicode`?

KennyTM 2010-06-03 20:03:07

@KennyTM: unicode

Mike Boers 2010-06-03 21:01:31

Answer 2

A:

According to my interpretation of the implementation of unicode-escape and the unicode repr in the CPython 2.6.5 source, yes; the only difference between repr(unicode_string) and unicode_string.encode('unicode-escape') is the inclusion of wrapping quotes and escaping whichever quote was used.

They are both driven by the same function, unicodeescape_string. This function takes a parameter whose sole function is to toggle the addition of the wrapping quotes and escaping of that quote.

Mike Boers 2010-06-08 23:06:46

ansaurus

tags:

views:

answers:

Python "string_escape" vs "unicode_escape"

related questions