Full description of a compression algorithm

tags:

compression

views:

251

answers:

+2 Q:

Full description of a compression algorithm

I am looking for a compression algorithm (for a programming competition) and I need a full description of how to implement it (all technical details), any loseless and patent-free algorithm will do, but the ease of implementation is a bonus :)

(Although possibly irrelevant) I plan to implement the algorithm in C++...

Thanks in advance.

EDIT: I will be compressing text files only, no other file types...

+3 A:

I'd start here on Wikipedia.

There's a whole lot to choose from, but without knowing more about what you want it's difficult to help more. Are you compressing text, images, video or just random files? Each one has it's own set of techniques and challenges for optimal results.

If ease of implementation is the sole criterion I'd use "filecopy" compression. Guaranteed compression ratio of exactly 1:1, and trivial implementation...

Roddy 2009-07-15 13:17:21

+3 A:

Huffman is good if you're compressing plain text. And all the commenters below assure me it's a joy to implement ;D

pianoman 2009-07-15 13:18:57

I've implemented it, I actually think it's pretty clean and not all that hard once you understand the algorithm.

David Zaslavsky 2009-07-15 14:11:59

It's plenty easy compared to deflate, any kind of arithmetic coding, or the Burrows-Wheeler transform. The only thing that's easier is run length encoding, and that's terrible for text.

Dan Hook 2009-07-24 16:54:30

A straight LZ77 (lookback only, 4 bits of excess-3 length and 12 bits of lookback) is easier than Huffman, decodes much faster. Encoding is slow with a naive approach, however.

Nosredna 2009-07-24 17:58:30

+4 A:

Well, I can't go so far as to complete the competition for you, but please check out this article on wiki: Run Length Encoding. It is by far one of the simplest ways to compress data, albeit not always an efficient one. Compression is also domain specific, even amongst lossless algorithms you will find that what you are compressing determines how best to encode it.

Adam Luter 2009-07-15 13:19:04

Agreed! There is a huge difference between compressing plaintext, for example, versus a bit of machine-language.

pianoman 2009-07-15 14:12:13

+2 A:

RFC 1951 describes inflate/deflate, including a brief description of the compressor's algorithm. Antaeus Feldspar's An Explanation of the Deflate Algorithm provides a bit more background.

Also, the zlib source distribution contains a simplified reference inflater in contrib/puff/puff.c that can be helpful reading to understand exactly how the bits are arranged (but it doesn't contain a deflate, only inflate).

puetzk 2009-07-15 13:28:06

I searched Google before asking the question here on SO, and I found Feldspar's explanation but it lacked the "technical details" needed to know how to implement DEFLAT, but the first link appears to be just what I am looking for, thanks

Lawand 2009-07-15 13:40:25

If you go under "View" in your internet browser, there should be an option to either "Zoom Out" or make the text smaller.

Select one of those and...

BAM!

You just got more text on the same screen! Yay compression!

samoz 2009-07-15 13:32:24

Kind of funny :-)

David Zaslavsky 2009-07-15 14:10:54

Love this. People should freaking **think** before downvoting.

pianoman 2009-07-24 17:27:02

+1 Programmers are notoriously too serious about everything.

spoulson 2009-07-24 17:33:40

The Security Now! podcast recently put out an episode highlighting data compression algorithms. Steve Gibson gives a pretty good explanation of the basics of Huffman and Lempel-Ziv compression techniques. You can listen to the audio podcast or read the transcript for Episode 205.

spoulson 2009-07-24 17:32:50

+1 A:

Ease of implementation: Huffman, as stated before. I believe LZW is no longer under patent, but I don't know for sure. It' a relatively simple algorithm. LZ77 should be available, though. Lastly, the Burrows-Wheeler transform allows for compression, but it's significantly more difficult to implement.

San Jacinto 2009-07-24 17:36:06

+1 A:

I like this introduction to the Burrows-Wheeler Transform.

Ludwig Weinzierl 2009-07-24 17:54:42

Interesting algorithm.

spoulson 2009-07-24 18:08:39

ansaurus

tags:

views:

answers:

Full description of a compression algorithm

related questions