tags:

views:

514

answers:

3

What is the best compression algorithm with the following features:

  • should take less time to decompress (can take reasonably more time compress)
  • should be able to compress sorted data (approx list of 3,000,000 strings/integers ...)

Please suggest along with metrics: compression ratio, algorithmic complexity for compression and decompression (if possible)?

+1  A: 

Well if you just want speed, then standard ZIP compression is just fine and it's most likely integrated into your language/framework already (ex: .NET has it, Java has it). Sometimes the most universal solution is the best, ZIP is a very mature format, any ZIP library and application will work with any other.

But if you want better compression, I'd suggest 7-Zip as the author is very smart, easy to get ahold of and encourages people to use the format.

Providing you with compression times is impossible, as it's directly related to your hardware. If you want a benchmark, you have to do it yourself.

TravisO
+5  A: 

Entire site devoted to compression benchmarking here

Dave Swersky
Wow good link, that site is so obsessive, it's both awesome and scary.
TravisO
good link. thanks for the info.
FL4SOF
first link seems to be broken...
Lawand
First link is broken
John JJ Curtis
+1  A: 

You don't have to worry about decompression time. The time spent the higher compression level is mostly finding the longest matching pattern.

Decompression either

1) Writes the literal 
2) for (backward position, length)=(m,n) pair, 
   goes back, in the output buffer, m bytes, 
   reads n bytes and 
   writes n bytes at the end of the buffer.

So the decompression time is independent of the compression level. And, with my experience with Universal Decompression Virtual Machine (RFC3320), I guess the same is true for any decompression algorithm.

yogman