ansaurus

Question

Stream/string/bytearray transformations in Python 3

Answer 1

+2 A:

It looks as though all these non-codec modules are being handled on a case-by-case basis. Here's what I've found so far:

base64 is now available via base64 module
bz2 can now be done using bz2 module
hex string encoding/decoding can be done with the hexlify and unhexlify functions of the binascii module (a bit of a hidden feature)

I guess that means there's no standard framework for creating such string/bytearray transformation modules, but they're being done on a case-by-case basis in Python 3.

Craig McQueen 2009-08-05 12:38:59

Answer 2

+1 A:

What specifically is your need for line ending conversion? If it's just for writing to a file or file object, you can specify what line ending format to use with open(), and \n will automatically be converted to that when you write to a file. Admittedly, this only works with files open as text, not data. (You can also specify what encoding to use when writing text to the file, which can be useful sometimes.)

http://docs.python.org/3.1/library/functions.html#open

To do it with regular strings for conversion, you can simply do yourstring = yourstring.replace('\n', '\r\n') for conversion from Linux-style to Windows-style, and yourstring = yourstring.replace('\r\n', '\n') for conversion from Windows-style to Linux-style. You probably already know this, though, and it's probably not what you're looking for. (And, in fact, if you're writing to a text file, it should convert \n to \r\n on a Windows system anyway if universal newline mode is enabled, which is the default.)

As well, if you're wanting to convert between the various Unicode mappings (assuming you're working with byte sequences, as the strings Python uses internally aren't actually set to any specific type of Unicode), it's just a matter of decoding the byte sequence using bytes.decode() or bytearray.decode() and then encoding using str.encode(). For a conversion from UTF-8 to UTF-16:

newstring = yourbytes.decode('utf-8')
yourbytes = newstring.encode('utf-16')

There shouldn't be any problems with newline characters not being converted properly between the two Unicode formats when done this way.

There is also str.translate() and str.maketrans(), though I'm not sure if those will prove useful:

http://docs.python.org/3.1/library/stdtypes.html#str.translate
http://docs.python.org/3.1/library/stdtypes.html#str.maketrans

On a side note, rot_13 can be implemented as so:

import string
rot_13 = str.maketrans({x: chr((ord(x) - ord('A') + 13) % 26 + ord('A') if x.isupper() else ((ord(x) - ord('a') + 13) % 26 + ord('a'))) for x in string.ascii_letters})

# Using hard-coded values:

rot_13 = str.maketrans('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz', 'NOPQRSTUVWXYZABCDEFGHIJKLMnopqrstuvwxyzabcdefghijklm')

Either way, using S.translate(rot_13) will cause normal strings to become rot_13 and rot_13 strings to become normal ones.

JAB 2009-08-05 13:01:21

Thanks for your answer. The thing that is missing from these solutions is a framework that allows them to be easily applied as a transformation to a stream, in a similar way to the codec framework. See the stream transformation in http://stackoverflow.com/questions/1169742/bug-with-python-utf-16-output-and-windows-line-endings#answer-1170469 for an example of what I want to do. Does Python 3 have a standard framework for such stream transformations, similar to the codec framework?

Craig McQueen 2009-08-06 01:09:36

Apparently you can do it exactly like it shows there; the only difference is that you use `sys.stdout.buffer` rather than `sys.stdout`. You'll still have that `\n` problem, though; I'll look into that in a bit.

JAB 2009-08-06 15:14:44

(On a side note, if you do end up using that `CRLFWrapper` class from your other question, I'd recommend using `re.sub()` instead of `str.replace()`, with the pattern to match being `(?<!\r)\n` and the replacement string being `\r\n`; this will avoid repeated carriage returns, which may or may not mess things up.)

JAB 2009-08-06 16:12:30

ansaurus

tags:

views:

answers:

Stream/string/bytearray transformations in Python 3

related questions