views:

53

answers:

4

In an effort to get better at programming assembly, and as an academic exercise, I would like to write a non-trivial program in x86 assembly. Since file compression has always been kind of an interest to me, I would like to write something like the zip utility in assembly.

I'm not exactly out of my element here, having written a simple web server using assembly and coded for embedded devices, and I've read some of the material for zlib (and others) and played with its C implementation.

My problem is finding a routine that is simple enough to port to assembly. Many of the utilities I've inspected thus far are full of #define's and other included code. Since this is really just for me to play with, I'm not really interested in super-awesome compression ratios or anything like that. I'm basically just looking for the RC4 of compression algorithms.

Is a Huffman Coding the path I should be looking down or does anyone have another suggestion?

+1  A: 

One option would be to write a decompressor for DEFLATE (the algorithm behind zip and gzip). zlib's implementation is going to be heavily optimized, but the RFC gives pseudocode for a decoder. After you have learned the compressed format, you can move on to writing a compressor based on it.

bdonlan
Thank you for the suggestion. The heavily optimized code was exactly what I was trying to avoid reading, but the RFC was relatively painless to follow.
mrduclaw
+1  A: 

I remember a project from second year computing science that was something similar to this (in C).

Basically, compressing involves replacing a string of xxxxx (5 x's) with @\005x (the at sign, a byte with a value of 5, followed by the repeated byte. This algorithm is very simple. It doesn't work that well for English text, but works surprisingly well for bitmap images.

Edit: what I am describing is run length encoding.

Turtle
Thank you, while I didn't select this method, the background reading was very helpful.
mrduclaw
+1  A: 

And here is a more sophisticated algorithm which should not be too hard to implement: LZ77 (containing assembly examples) or LZ77 (this site contains many different compression algorithms).

Turtle
This was fantastic, and eventually what I ended up doing. Thanks so much!
mrduclaw
+1  A: 

Take a look at UPX executable packer. It contains some low-level decompressing code as part of unpacking procedures...

Dennis Yurichev
Delicious. <3 the UPX source.
mrduclaw