views:

405

answers:

5

Suppose I have any data stored in bytes. For example:

0110001100010101100101110101101

How can I store it as printable text? The obvious way would be to convert every 0 to the character '0' and every 1 to the character '1'. In fact this is what I'm currently doing. I'd like to know how I could pack them more tightly, without losing information.

I thought of converting bits in groups of eight to ASCII, but some bit combinations are not accepted in that format. Any other ideas?

+1  A: 

Try the standard array module or the struct module. These support storing bytes in a space efficient way -- but they don't support bits directly.

You can also try http://cobweb.ecn.purdue.edu/~kak/dist/BitVector-1.2.html or http://ilan.schnell-web.net/prog/bitarray/

pts
+7  A: 

What about an encodeing that only uses "safe" charectors like base64?
http://en.wikipedia.org/wiki/Base64

EDIT: That is assumeing you want to safey store the data in text files and such?

In 2.x strings should be fine (python doesnt use null terminated strings, so dont worry about that).

Else in 3.x check out the bytes and bytearray objects http://docs.python.org/3.0/library/stdtypes.html#bytes-methods

Fire Lancer
does that encoding support any combination of 8 bits?How could I turn bits to a Base64 char?
Manuel
Check the wiki page, it basicly encodes any given binary data useing 64 charectors which are generally safe to send to/from web pages, in urls, text files etc. Obv for this reason though it uses one charector for every 6 bits (hence being called base64, 2^6 = 64)
Fire Lancer
I see.. I'll check that then... Thanks!
Manuel
Might be simpler to just use `repr()` than base64. Everything's escaped properly and the size doesn't grow quite as much.
S.Lott
Well I'm assuming if it had a sensible text representation he would have just used that in the first place rather than a string of 1's and 0's, and that his binary data is a video clip or some other binary thing like that.
Fire Lancer
+2  A: 

Not sure what you're talking about.

>>> sample = "".join( chr(c) for c in range(256) )
>>> len(sample)
256
>>> sample
'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\
x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?@ABC
DEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83
\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97
\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab
\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf
\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3
\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7
\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb
\xfc\xfd\xfe\xff'

The string sample contains all 256 distinct bytes. There is no such thing as a "bit combinations ... not accepted".

To make it printable, simply use repr(sample) -- non-ASCII characters are escaped. As you see above.

S.Lott
Although if hes using python 3.x which always uses unicode strings, its still going to be 4x (I think python uses UCS4 internally for unciode right?) the size it needs to be.
Fire Lancer
Unicode has nothing to do with it. Each byte of `sample` has one of the 256 possible byte values. The question says "some bit combinations are not accepted", which is clearly not true.
S.Lott
A: 

For Python 2.x, your best bet is to store them in a string. Once you have that string, you can encode it into safe ASCII values using the base64 module that comes with python.

import base64
encoded = base64.b64encode(bytestring)

This will be much more condensed than storing "1" and "0".

For more information on the base64 module, see the python docs.

Tim Swast
A: 

· Smart caches frequently used applications and files for maximum performance speed up; · Supports both USB and non-USB removable media devices, as well as additional hard disks; · Allows up to 4 devices for simultaneous smart caching; · Compatible with all ReadyBoost ready devices.

hj