views:

178

answers:

1

Is there a very fast compression library for Java? The standard gzip library is slower than I would like. I'm looking for something similar to http://www.oberhumer.com/opensource/lzo/ that's native Java code that provides fast compression and decompression. Thanks!

A few other fast compression libraries for future reference:

QuickLZ - C/C#/Java - GPL or commercial http://www.quicklz.com/

libLZF - C - BSD style license http://oldhome.schmorp.de/marc/liblzf.html

FastLZ - C - MIT style license http://fastlz.org/

LZO C - GPL or commercial http://www.oberhumer.com/opensource/lzo/

zlib - C / Java (GZIP and deflate) - Commercial friendly license http://zlib.net/

Hadoop-LZO integration (JNI): http://github.com/kevinweil/hadoop-lzo

Benchmarks from the QuickLZ folks: http://www.quicklz.com/bench.html

+5  A: 

You could use the DeflatorOutputStream and InflatorInputStream. These both use LZW compression. You could just use the library they provide.

EDIT: Real time performance is usually measured in terms of latency, however you quote numbers in terms of throughtput. Could you clarify what you mean by real-time.

For latency, using the DEFAULT_STRATEGY, each call took 220 ns + 11 ns/byte on average.

Note: in low latency situations you often get many times the latency you might expect when the CPU is running "hot". You have perform the timing in a realistic situation.

EDIT: This is the compression rates I got with Java 6 update 21;

Raw OutputStream.write() - 2485 MB/sec

Deflator.NO_COMPRESSION - 99 MB/s

Deflator.DEFAULT_STRATEGY - 95 MB/s

Deflator.BEST_SPEED - 85 MB/s.

Deflator.FILTERED - 77 MB/s

Deflator.HUFFMAN_ONLY - 79 MB/s

Deflator.DEFAULT_COMPRESSION - 30 MB/s

Deflator.BEST_COMPRESSION - 14 MB/s

Note: I am not sure why the default setting is faster than the "best speed" setting. I can only assume the former has been optimised.

The output buffer size was 4KB, you might find a different size is best for you.

EDIT: The following code prints for a large CSV file. The latency is for a 5KB block.

Average latency 48532 ns. Bandwidth 105.0 MB/s.
Average latency 52560 ns. Bandwidth 97.0 MB/s.
Average latency 47602 ns. Bandwidth 107.0 MB/s.
Average latency 51099 ns. Bandwidth 100.0 MB/s.
Average latency 47695 ns. Bandwidth 107.0 MB/s.

.

public class Main {
    public static void main(String... args) throws IOException {
        final String filename = args[0];
        final File file = new File(filename);
        DataInputStream dis = new DataInputStream(new FileInputStream(file));
        byte[] bytes = new byte[(int) file.length()];
        dis.readFully(bytes);
        test(bytes, false);
        for (int i = 0; i < 5; i++)
            test(bytes, true);
    }

    private static void test(byte[] bytes, boolean print) throws IOException {
        OutputStream out = new ByteOutputStream(bytes.length);
        Deflater def = new Deflater(Deflater.DEFAULT_STRATEGY);
        DeflaterOutputStream dos = new DeflaterOutputStream(out, def, 4 * 1024);
        long start = System.nanoTime();
        int count = 0;
        int size = 5 * 1024;
        for (int i = 0; i < bytes.length - size; i += size, count++) {
            dos.write(bytes, i, size);
            dos.flush();
        }
        dos.close();
        long time = System.nanoTime() - start;
        long latency = time / count;
        // 1 byte per ns = 1000 MB/s.
        long bandwidth = (count * size * 1000L) / time;
        if (print)
            System.out.println("Average latency " + latency + " ns. Bandwidth " + bandwidth + " MB/s.");    
    }
}
Peter Lawrey
DeflatorOutputStream is the basis for the GZIPOutputStream which compresses at about the same speed: 10 MB/s.
Joshua Martell
The LZO compression claims 5+ MB/sec. If you are saying this isn't fast enough perhaps you should indicate what your requirement is. Given you haven't given much detail, you cannot expect a more specific answer.
Peter Lawrey
Perhaps you could also answer the question about what type of data you are trying to compress. What medium are you writing to? e.g. network or disk. What is the underlying speed?
Peter Lawrey
If you are only getting 10 MB/s, what hardware are you using? Is this a mobile device?
Peter Lawrey
I didn't know about the Deflator compression settings, and I wish were exposed in the GZIP classes! I'm much interested in throughput then latency. I'm compressing 5k chunks which is slowing down my throughput considerably. However, I'm unable to come close to your throughput numbers on my i7 under Linux with the same JVM. Would you post the code you used to generate these throughput numbers? BTW, LZO's 5MB/s is on a Pentium 133, so I would expect much higher rates on a modern CPU.
Joshua Martell