ansaurus

Question

Answer 1

+4 A:

Do you mean "compress to UTF-8 strings"? I'll assume that, since any generic compressor will compress UTF-8 strings. However, no real-world compressor is going to compress to a UTF-8 string.

You can't store 8-bit data like UTF-8 directly in JSON, because JSON strings are defined as Unicode. You'd have to base64-encode the data before giving it to JSON:

json.dumps({ 'compressedData' : base64.b64encode(zString) })

However, base64 inherently causes a 4/3 encoding overhead. If you're compressing typical string data you'll probably get enough compression for this to still be a win, but it's a significant overhead. You might find an encoding with a little less overhead, but not much.

Note that if you're using this to send data to a browser, you're better off letting HTTP compression do this; it's widely-supported and will be much more robust.

Glenn Maynard 2010-10-14 01:19:11

+1 for HTTP compression. Whilst you can theoretically get somewhat-more-efficient-than-base64 encoding, it's full of pitfalls. Let your web server (mod_deflate etc) handle it.

bobince 2010-10-14 01:37:10

Sadly, it's not going to a browser but this answer was full of wondrous information.

Ralphleon 2010-10-14 01:46:26

ansaurus

tags:

views:

answers:

UTF-8 compatible compression in python

related questions