views:

300

answers:

3

I am writing a huffman implementation in Python as a learning exercise. I have got to the point of writing out my variable length huffman codes to a buffer (or file). Only to find there does not seem to be a bitstream class implemented by Python! I have had a look at the array and struct modules but they do not seem to do what I need without extra work.

A bit of goggling turned up this bitstream implementation, which is more like what I am wanting. Is there really no comparable bitstream class in the Python standard library?

A: 

Correct. Most of the modules in the stdlib that need bitstreaming are written in C, with the details hidden away.

Ignacio Vazquez-Abrams
A: 

No, as far as I know there's nothing in the standard library that helps you with bit-aligned operations. Python is not designed to fiddle with the small stuff ^^...

But you could easily write your own bitstream-writer with the help of byte arrays:

>>> from array import array
>>> a = array("B")
>>> a.append(1) # 128
>>> a.append(0)
>>> a.append(0)
>>> a.append(0)
>>> a.append(1) # 8
>>> a.append(1) # 4
>>> a.append(1) # 2
>>> a.append(1) # 1
>>> print reduce(lambda m, n: (m << 1) + n, a, 0)
143

You get the idea...

AndiDog
+6  A: 

You're right that there's nothing in the standard library, but have you tried the bitstring module? It's pretty much designed for this kind of application, is stable and well documented, so I think it should suit your needs.

Construction, reading, slicing, etc. are all done bitwise, and it's pure Python. I've seen Huffman coding examples done with it quite successfully in the past.

Another good option is bitarray, which doesn't have as many features but can be considerably faster as it's a C extension. As a bonus it has a Huffman coding example distributed as part of the source package.

Scott Griffiths
'bitstring' is REALLY slow, but it works.
Judge Maygarden
@Judge: It's true that bitstring isn't as fast it could be with C extensions, but I think it's pretty reasonable for pure Python (at least the latest versions are). If you have a good example where it was really slow for you then the author (that's me by the way) would love to see it. It's always helpful to have real-world use cases to see where the time is being spent.
Scott Griffiths
Sorry, that was not the most constructive comment. I needed to write a quick script to extract a large sequence of 12-bit integer triplets. bitstring served that purpose just fine (i.e. the python script was very quick and easy to write), but the end result is slow. It's going to be rewritten in another language and embedded in an application eventually. So, I'm happy with the result of using bitstring for the prototype.
Judge Maygarden
Also, bitarray does not appear to have a function for extracting integer values. So, bitstring was the logical choice when I quickly compared the feature sets.
Judge Maygarden