tags:

views:

173

answers:

5

I want to refine my C skills and have been thinking of trying to write my own zip and unzip program. This seems hit on a lot of areas, CPU/HDD/Memory.

Where do I start? Is there a flow chart of what to do to compress and uncompress? Is it too complicated for this type of project?

A good book that steps through all the steps to take or a site?

I wonder if anyone has any good resources for this or maybe any additional suggestions.

+1  A: 

I'd start by having a good look at the external links at the following wikipedia pages (they link to full specifications of the format):

ChristopheD
+1  A: 

Additional suggestion, just in case you are looking for something more difficult.

The program Crinkler specializes in compressing small executables. Here is info about how it works.

neoneye
+2  A: 

You might want to read up on Huffman encoding on wikipedia. The encoding is pretty simple, and you can achieve some level of compression with this. This compression algorithm will help you with linked lists, memory allocation-deallocation, and correct choice of data structures.

If you want to implement something extremely simple, just implement Run Length Encoding.

Ashwin
+1 For suggesting both Huffman encoding and Run Length Encoding, as they're both nice choices for learning.
Brian
+1  A: 

ZIP is a combination of 2 things, a file packaging format and a (set of) compression algorithms. The first is a bit prosaic but would hone your bit-didling skills, the second more interesting and advanced.

I remember having to implement LZW encoding and decoding in C in order to read and write GIF files. This would be a fine project, as LZW compression is very very clever - one of the only algorithms I've seen that I think deserves a patent, and a much more achievable aim than ZIP

Duncan McGregor
A: 

You might want to take a look at open source applications like Zip, Gzip, Bzip, and 7-Zip, which implement their own variations on the whole compress files thing. 7-Zip specifically has their own compression format (7z) that actually gets files smaller than if you were to use straight zip, so there's something to be learned there.

I'm not saying copy their code or anything but looking at something someone has done before can usually get the wheels turning and get you to think about the problem in a different way that will help you get some forward momentum.

Koby