ansaurus

Question

How can you print a string using raw_unicode_escape encoding in python 3?

Answer 1

+1 A:

I can't reproduce your issue, please see previous revisions of this answer for a log of my attempts (which explains my link in the comments).

However:

It seems like you are trying to force an encoding while writing to a file by doing all the legwork yourself. However in Python 3, open() accepts an encoding parameter that does all the magic for you.

badp@delta:~$ python3
Python 3.1.2 (r312:79147, Apr 15 2010, 12:35:07) 
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> foo = open("look mah, utf-32", "w", encoding="utf-32")
>>> foo.write("bar")
3
>>> foo.close()
>>> foo = open("look mah, utf-32", "rb")
>>> foo.read()
b'\xff\xfe\x00\x00b\x00\x00\x00a\x00\x00\x00r\x00\x00\x00'

If you are looking for a Python 2 equivalent, it seems like you really want to use io.open().

badp 2010-06-14 15:55:07

http://kitenet.net/~joey/blog/entry/unicode_eye_chart/

badp 2010-06-14 15:55:11

Try to do this running from inside a file, it will not work. Running from the console seams to work but not from files. Also I added a new comment regarding usage of buffer.

Sorin Sbarnea 2010-06-14 16:08:10

@Sorin: Still no repro :/

badp 2010-06-14 16:16:28

Answer 2

A:

http://docs.python.org/py3k/library/functions.html#ascii

As repr(), return a string containing a printable representation of an object, but escape the non-ASCII characters in the string returned by repr() using \x, \u or \U escapes. This generates a string similar to that returned by repr() in Python 2.

And the resulting string will indeed be of type str rather than bytes.

Example:

>>> a = '''Ⴊ ⇠ ਐ ῼ இ ╁ ଠ ୭ ⅙ ㈣'''
>>> ascii(a)
"'\\u10aa \\u21e0 \\u0a10 \\u1ffc \\u0b87 \\u2541 \\u0b20 \\u0b6d \\u2159 \\u3223'"
>>> print(ascii(a))
'\u10aa \u21e0 \u0a10 \u1ffc \u0b87 \u2541 \u0b20 \u0b6d \u2159 \u3223'

And if you wanted to trim off the extra quotes, you could just do print(ascii(a)[1:-1]).

EDIT: As Alex states, you'd have to use repr in Python 2.6 instead of ascii. His solution does indeed work for both Python 2 and 3, but if you plan on doing the conversion a lot (and thus would prefer something easier to type multiple times), one possibility is to put a conditional at the start of your program as follows:

import sys
if sys.version_info[0] == 3:
    unic = ascii
else:
    unic = repr

And then you just use unic (or whatever you want to call it) wherever you'd use repr in Python 2 and ascii in Python 3.

...Though I suppose you could use elif sys.version_info[0] == 2: instead of else: if you wanted to be a bit more careful.

JAB 2010-06-14 16:15:55

`ascii` is not in 2.6, though.

Alex Martelli 2010-06-14 18:09:45

@Alex: True, but as shown in the quote in my answer, `repr` is.

JAB 2010-06-14 18:21:33

@JAB, `repr` in Python 3 does not replace non-ascii characters with escapes -- using two different functions depending on language level (`repr` in Python 2, `ascii` in Python 3) is not wht the OP requires, "a solution that will work with Python 2.6 or newer, including 3.x".

Alex Martelli 2010-06-14 18:32:54

@Alex: Updated my answer with a simple solution to that.

JAB 2010-06-14 18:55:47

Answer 3

+2 A:

I'd just use:

print(str2.encode('raw_unicode_escape').decode('ascii'))

if you want identical code in Python 3 and Python 2.6 (otherwise you could use repr in 2.6 and ascii in Python 3, but that's not really "identical";-).

Alex Martelli 2010-06-14 18:11:30

Thanks Alex, I'm currently looking to build a set of function overrides for Python 2.6+/3.x for making it more Unicode friendly. I hope I will succeed. Any idea on how to override file.write function to make it accept bytes? it is related to http://stackoverflow.com/questions/984014/python-3-is-using-sys-stdout-buffer-write-good-style

Sorin Sbarnea 2010-06-14 19:04:40

@Sorin, you typically do type-checks if you want to accept both unicode and byte strings and treat them differently; sometimes you can get away with adaptation (e.g. a method that accepts a unicode string and returns it unchanged, **or** a byte string and returns the unicode string obtained by decoding it) -- but that's a bit far afield of what can easily be discussed in a comment, and really very different from the original question, so you may want to ask another, separate question about this!

Alex Martelli 2010-06-14 19:22:43

ansaurus

tags:

views:

answers:

How can you print a string using raw_unicode_escape encoding in python 3?

Update

related questions