views:

160

answers:

6

Can anybody give pointers how I can implement lzw compression/decompression in low memory conditions (< 2k). is that possible?

A: 

My best recommendation is to examine the BusyBox source and see if their LZW implementation is sufficiently small to work in your environment.

R..
Probably not -- BusyBox is intended for systems with far more resources.
tomlogic
Have you checked rather than just saying "probably not". Pretty much all of the algorithms in BusyBox are the smallest (both in code size and working space) any of the developers have been able to find/create, often at the expense of decent performance. If performance is too bad, there's usually a compile-time option to choose between tiny size with horrible performance and moderate size/performance.
R..
@tomlogic: most LZW implementations I have seen in past had the size of dictionary (main memory) as a compile-time define. It is worth checking. Minimum size of dictionary IIRC is 257+1. But that would be an equivalent to no compression at all.
Dummy00001
+1  A: 

It has been over 15 years since I last played with the LZW compression algorithm, so take the following with a grain of salt.

Given the memory constraints, this is going to be difficult at best. The dictionary you build is going to consume the vast majority of what you have available. (Assuming that code + memory <= 2k.)

Pick a small fixed size for your dictionary. Say 1024 entries.

Let each dictionary entry take the form of ....

 struct entry {
    intType   prevIdx;
    charType  newChar;
 };

This structure makes the dictionary recursive. You need the item at the previous index to be valid in order for it to work properly. Is this workable? I'm not sure. However, let us assume for the moment that it is and find out where it leads us ....

If you use the standard types for int and char, you are going to run out of memory fast. You will want to pack things together as tightly as possible. 1024 entries will take 10 bits to store. Your new character, will likely take 8 bits. Total = 18 bits.

18 bits * 1024 entries = 18432 bits or 2304 bytes.

At first glance this appears too large. What do we do? Take advantage of the fact that the first 256 entries are already known--your typical extended ascii set or what have you. This means we really need 768 entries.

768 * 18 bits = 13824 bits or 1728 bytes.

This leaves you with about 320 bytes to play with for code. Naturally, you can play around with the dictionary size and see what's good for you, but you will not end up with very much space for your code. Since you are looking at so little code space, I would expect that you would end up coding in assembly.

I hope this helps.

Sparky
Is it possible to discard the dictionary and re-parse to obtain just the needed entry from the source on every lookup? Of course this will be incredibly slow but it may be the only way.
R..
@R..: I do not think so. Dictionary is the state of en/decoder. Bits in the encoded stream are indexes into the dictionary.
Dummy00001
+3  A: 

The zlib library that everyone uses is bloated among other problems (for embedded). I am pretty sure it wont work for your case. I had a little more memory maybe 16K and couldnt get it to fit. It allocates and zeros large chunks of memory and keeps copies of stuff, etc. The algorithm can maybe do it but finding existing code is the challenge.

I went with http://lzfx.googlecode.com The decompression loop is tiny, it is the older lz type compression that relies on the prior results so you need to have access to the uncompressed results...The next byte is a 0x5, the next byte is a 0x23, the next 15 bytes are a copy of the 15 200 bytes ago, the next 6 bytes are a copy of 127 ago...the newer lz algorithm is variable width table based that can be big or grow depending on how implemented.

I was dealing with repetitive data and trying to squeeze a few K down into a few hundred, I think the compression was about 50%, not great but did the job and the decompression routine was tiny. The lzfx package above is small, not like zlib, like two main functions that have the code right there, not dozens of files. You could likely change the depth of the buffer, perhaps improve the compression algorithm if you so desire. I did have to modify the decompression code (like 20 or 30 lines of code perhaps) it was pointer heavy and I switched it to arrays because in my embedded environment the pointers were in the wrong place. Burns maybe an extra register or not depending on how you implement it and your compiler. I also did that so I could abstract the fetches and the stores of the bytes as I had them packed into memory that wasnt byte addressable.

If you find something better please post it here or ping me through stackoverflow, I am also very interested in other embedded solutions. I searched quite a bit and the above was the only useful one I found and I was lucky that my data was such that it compressed well enough using that algorithm...for now.

dwelch
There's a compression program called pucrunch http://www.cs.tut.fi/~albert/Dev/pucrunch/ that maybe you'll find interesting.
ninjalj
Well I found this http://www.embedded.com/design/opensource/217800397 It says "This compression technique is capable of achieving respectable compression ratios, typically on the order of 50 " 60%, while consuming about 2K of RAM"....I need to try this out
Manas
A: 

Can anybody give pointers how I can implement lzw compression/decompression in low memory conditions (< 2k). is that possible?

Why LZW? LZW needs lots of memory. It is based on a hash/dictionary and compression ratio is proportional to the hash/dictionary size. More memory - better compression. Less memory - output can be even larger than input.

I haven't touched encoding for very long time, but IIRC Huffman coding is little bit better when it comes to memory consumption.

But it all depends on type of information you want to compress.

Dummy00001
+1  A: 

If the choice of compression algorithm isn't set in stone, you might try gzip/LZ77 instead. Here's a very simple implementation I used and adapted once:

ftp://quatramaran.ens.fr/pub/madore/misc/myunzip.c

You'll need to clean up the way it reads input, error handling, etc. but it's a good start. It's probably also way too big if your data AND code need to fit in 2k, but at least the data size is small already.

Big plus is that it's public domain so you can use it however you like!

R..
A: 

I have used LZSS. I used code from Haruhiko Okumura as base. It uses the last portion of uncompressed data(2K) as dictionary. The code I linked can be modified to use almost no memory if you have all the uncompressed data available in memory. With a bit of googling you will find that a lot of different implementations.

Gerhard