tags:

views:

251

answers:

8

I am looking for a compression algorithm (for a programming competition) and I need a full description of how to implement it (all technical details), any loseless and patent-free algorithm will do, but the ease of implementation is a bonus :)

(Although possibly irrelevant) I plan to implement the algorithm in C++...

Thanks in advance.

EDIT: I will be compressing text files only, no other file types...

+3  A: 

I'd start here on Wikipedia.

There's a whole lot to choose from, but without knowing more about what you want it's difficult to help more. Are you compressing text, images, video or just random files? Each one has it's own set of techniques and challenges for optimal results.

If ease of implementation is the sole criterion I'd use "filecopy" compression. Guaranteed compression ratio of exactly 1:1, and trivial implementation...

Roddy
+3  A: 

Huffman is good if you're compressing plain text. And all the commenters below assure me it's a joy to implement ;D

pianoman
I've implemented it, I actually think it's pretty clean and not all that hard once you understand the algorithm.
David Zaslavsky
It's plenty easy compared to deflate, any kind of arithmetic coding, or the Burrows-Wheeler transform. The only thing that's easier is run length encoding, and that's terrible for text.
Dan Hook
A straight LZ77 (lookback only, 4 bits of excess-3 length and 12 bits of lookback) is easier than Huffman, decodes much faster. Encoding is slow with a naive approach, however.
Nosredna
+4  A: 

Well, I can't go so far as to complete the competition for you, but please check out this article on wiki: Run Length Encoding. It is by far one of the simplest ways to compress data, albeit not always an efficient one. Compression is also domain specific, even amongst lossless algorithms you will find that what you are compressing determines how best to encode it.

Adam Luter
Agreed! There is a huge difference between compressing plaintext, for example, versus a bit of machine-language.
pianoman
+2  A: 

RFC 1951 describes inflate/deflate, including a brief description of the compressor's algorithm. Antaeus Feldspar's An Explanation of the Deflate Algorithm provides a bit more background.

Also, the zlib source distribution contains a simplified reference inflater in contrib/puff/puff.c that can be helpful reading to understand exactly how the bits are arranged (but it doesn't contain a deflate, only inflate).

puetzk
I searched Google before asking the question here on SO, and I found Feldspar's explanation but it lacked the "technical details" needed to know how to implement DEFLAT, but the first link appears to be just what I am looking for, thanks
Lawand
A: 

If you go under "View" in your internet browser, there should be an option to either "Zoom Out" or make the text smaller.

Select one of those and...

BAM!

You just got more text on the same screen! Yay compression!

samoz
Kind of funny :-)
David Zaslavsky
Love this. People should freaking **think** before downvoting.
pianoman
+1 Programmers are notoriously too serious about everything.
spoulson
A: 

The Security Now! podcast recently put out an episode highlighting data compression algorithms. Steve Gibson gives a pretty good explanation of the basics of Huffman and Lempel-Ziv compression techniques. You can listen to the audio podcast or read the transcript for Episode 205.

spoulson
+1  A: 

Ease of implementation: Huffman, as stated before. I believe LZW is no longer under patent, but I don't know for sure. It' a relatively simple algorithm. LZ77 should be available, though. Lastly, the Burrows-Wheeler transform allows for compression, but it's significantly more difficult to implement.

San Jacinto
+1  A: 

I like this introduction to the Burrows-Wheeler Transform.

Ludwig Weinzierl
Interesting algorithm.
spoulson