views:

1197

answers:

7

I need to calculate a SHA-256 hash of large file (or portion of it). My implementation works fine, but its much slower than the C++'s CryptoPP calculation (25 Min. vs. 10 Min for ~30GB file). What I need is a similar execution time in C++ and Java, so the hashes are ready at almoust the same time. I also tryed the Bouncy Castle implementation, but it gave me the same result. Here is how I calculate the hash:

int buff = 16384;
try {
 RandomAccessFile file = new RandomAccessFile("T:\\someLargeFile.m2v", "r");

 long startTime = System.nanoTime();
 MessageDigest hashSum = MessageDigest.getInstance("SHA-256");

 byte[] buffer = new byte[buff];
 byte[] partialHash = null;

 long read = 0;

 // calculate the hash of the hole file for the test
 long offset = file.length();
 int unitsize;
 while (read < offset) {
  unitsize = (int) (((offset - read) >= buff) ? buff : (offset - read));
  file.read(buffer, 0, unitsize);

  hashSum.update(buffer, 0, unitsize);

  read += unitsize;
 }

 file.close();
 partialHash = new byte[hashSum.getDigestLength()];
 partialHash = hashSum.digest();

 long endTime = System.nanoTime();

 System.out.println(endTime - startTime);

} catch (FileNotFoundException e) {
 e.printStackTrace();
}
A: 

It used to be that Java ran about 10x slower than the same C++ code. Nowadays is closer to 2x slower. I think what your running into is just a fundamental part of Java. JVMs will get faster, especially as new JIT techniques are discovered, but you'll have a hard time out performing C.

Have you tried alternative JVMs and/or compilers? I used to get better performance with JRocket, but less stability. Ditto for using jikes over javac.

brianegge
I preffer to keep the sun jvm, since most people have it already installed, but I might try the both compilers you mentioned. Thanks for the tip!
stefita
+3  A: 

Perhaps the first thing today is work out where you are spending the most time? Can you run it through a profiler and see where the most time is being spent.

Possible improvements:

  1. Use NIO to read the file in the fastest possible way
  2. Update the Hash in a separate thread. This is actually rather hard to do and isn't for the faint hearted as it involves safe publishing between threads. But if your profiling shows a significant amount of time being spent in hash algorithm it may make better use of the disk.
Gareth Davis
+1  A: 

I suggest you use a profiler like JProfiler or the one integrated in Netbeans (free) to find out, where the time is actually spent and concentrate on that part.

Just a wild guess - not sure if it will help - but have you tried the Server VM? Try starting the app with java -server and see if that helps you. The server VM is more aggressive compiling Java code to native than the default client VM is.

Daniel Schneller
Good suggestion re: `-server`, but might be a moot point if the O.P. is using a 1.6 JVM on a 64-bit platform, where `-server` is the default.
Max A.
+1  A: 

Since you apparently have a working C++ implementation which is fast, you could build a JNI bridge and use the actual C++ implementation or maybe you could try not reinventing the wheel, especially since it's a big one and use a premade library such as BouncyCastle which has been made to solve all cryptographic needs of your program.

Esko
As I wrote above, I also tryed using BouncyCastle's implementation SHA256Digest(), which turned out to be almoust as fast.
stefita
Hm, somehow missed that, sorry.
Esko
np. But you may be right about using a wrapped C++ function since all improvements on the java side seem to decrease the speed by only 1/5 of the time.
stefita
+1  A: 

I think this difference in performance might only be platform related. Try changing the buffer size and see if there are any improvements. If not, I would go with JNI (Java Native Interface). Just call the C++ implementation from Java.

bruno conde
+13  A: 

My explanation may not solve your problem since it depends a lot on your actual runtime environment, but when I run your code on my system, the throughput is limited by disk I/O and not the hash calculation. The problem is not solved by switching to NIO, but is simply caused by the fact that you're reading the file in very small pieces (16kB). Increasing the buffer size (buff) on my system to 1MB instead of 16kB more than doubles the throughput, but with >50MB/s, I am still limited by disk speed and not able to fully load a single CPU core.

BTW: You can simplify your implementation a lot by wrapping a DigestInputStream around a FileInputStream, read through the file and get the calculated hash from the DigestInputStream instead of manually shuffling the data from a RandomAccessFile to the MessageDigest as in your code.


I did a few performance tests with older Java versions and there seem to be a relevant difference between Java 5 and Java 6 here. I'm not sure though if the SHA implementation is optimized or if the VM is executing the code much faster. The throughputs I get with the different Java versions (1MB buffer) are:

  • Sun JDK 1.5.0_15 (client): 28MB/s, limited by CPU
  • Sun JDK 1.5.0_15 (server): 45MB/s, limited by CPU
  • Sun JDK 1.6.0_16 (client): 42MB/s, limited by CPU
  • Sun JDK 1.6.0_16 (server): 52MB/s, limited by disk I/O (85-90% CPU load)


I was a little bit curious on the impact of the assembler part in the CryptoPP SHA implementation, as the benchmarks results indicate that the SHA-256 algorithm only requires 15.8 CPU cycles/byte on an Opteron. I was unfortunately not able to build CryptoPP with gcc on cygwin (the build succeeded, but the generated exe failed immediately), but building a performance benchmark with VS2005 (default release configuration) with and without assembler support in CryptoPP and comparing to the Java SHA implementation on an in-memory buffer, leaving out any disk I/O, I get the following results on a 2.5GHz Phenom:

  • Sun JDK1.6.0_13 (server): 26.2 cycles/byte
  • CryptoPP (C++ only): 21.8 cycles/byte
  • CryptoPP (assembler): 13.3 cycles/byte

Both benchmarks compute the SHA hash of a 4GB empty byte array, iterating over it in chunks of 1MB, which are passed to MessageDigest#update (Java) or CryptoPP's SHA256.Update function (C++).

I was able to build and benchmark CryptoPP with gcc 4.4.1 (-O3) in a virtual machine running Linux and got only appr. half the throughput compared to the results from the VS exe. I am not sure how much of the difference is contributed to the virtual machine and how much is caused by VS usually producing better code than gcc, but I have no way to get any more exact results from gcc right now.

jarnbjo
thanks for the results! I tryed profiling it on my machine and it showed significant time spent on calculating of the hash, but increasing the buffer was a good idea and the execution time dropped with about 50sec for a 4GB file.
stefita
I must admit that the Java performance is still not quite the throughput achieved by the CryptoPP library, but looking into the CryptoPP sources, the SHA core is actually implemented in assembler. You are unlikely to achieve the same performance with a Java program, but if you increase the buffer size, use the server VM and/or upgrade to Java 6, the performance is perhaps good enough?
jarnbjo
A: 

The MAIN reason why your code is so slow is because you use a RandomAccessFile which always has been quite slow performance-wise. I suggest using a "BufferedInputStream" so that you may benefit from all the power of the OS-level caching for disk-i/o.

The code should look something like:

    public static byte [] hash(MessageDigest digest, BufferedInputStream in, int bufferSize) throws IOException {
    byte [] buffer = new byte[bufferSize];
    int sizeRead = -1;
    while ((sizeRead = in.read(buffer)) != -1) {
        digest.update(buffer, 0, sizeRead);
    }
    in.close();

    byte [] hash = null;
    hash = new byte[digest.getDigestLength()];
    hash = digest.digest();
    return hash;
}
Shadow_x99