tags:

views:

827

answers:

10

I've become more and more comfortable programming in Java than with C++ or C. I am hoping to get a sense of the performance hit incurred using a JVM interpreter, as opposed to executing the same "project" natively. I realize that there is some level of subjectivity here; the quality of the program will depend highly on a good implementation. I'm interested in the following aspects in a general sense:

  • There must be some baseline for overhead when using an interpreter. Is there some general rule of thumb to remember? 10% 15%? (I pulled these numbers out of thin air) I have read the occasional blog stating that Java code is nearly as fast as native code, but I think that may have been biased.

  • Does the JVM garbage collector add significant overhead to runtime performance? I know Cocoa applications have begun to use a garbage collection model, and i agree that it makes programming a lot simpler, but at what cost?

  • What is the overhead of making system calls from Java? For example creating a Socket object as opposed to the C socket API.

  • Finally, I recall reading somewhere that the JVM implementation is single threaded. If this is true (which i am skeptical about), does that mean that Java threads really aren't true threads? Does a java thread, in general, correspond to an underlying kernel-provided thread? Does a Java application benefit in the same way a native application would from multiple cores / multiple cpu's?

Any advice from developer who understands the intricacies of JVM and java program performance would be much appreciated. Thanks.

+1  A: 

http://www.w3sys.com/pages.meta/benchmarks.html

http://www.freewebs.com/godaves/javabench%5Frevisited/

http://en.wikipedia.org/wiki/Comparison%5Fof%5FJava%5Fand%5FC%2B%2B#Performance

http://blog.dhananjaynene.com/2008/07/performance-comparison-c-java-python-ruby-jython-jruby-groovy/

http://www.irrlicht3d.org/pivot/entry.php?id=446

And so on. The fact is - it doesn't matter. Bottlenecks and slow software are created by the developers, not by the language (at least nowadays).

Bozho
"it doesn't matter" - except when it does.
igouy
+1  A: 

Actually, a VM can do a lot of optimizations at runtime, based on information that's only available at runtime, that a C/C++ compiler cannot do. So, in most circumstances, the JVM will be at least as fast as a native program.

Brian Goetz answers most, if not all of your questions in his talk Towards a universal VM.

jqno
Most circumstances? Hardly. In many circumstances the JVM will be fast enough for practical purposes, but when raw CPU performance is critical, a JVM is no match for a well-crafted C/C++ program. The point here is that there are very many cases where raw CPU performance is not critical.
JesperE
That was precisely my point. In most circumstances, raw CPU performance is not critical.
jqno
"Most circumstances" isn't true. The JVM has the *potential* to be faster becuase of the JIT'er, but in practice, it usually isn't. Not least because while the JIT compiler has a lot of useful runtime information at its disposal that enables better optimizations, it usually doesn't have the time to *use* them. A C++ compiler can take an hour to optimize a program during build. A JIT compiler has to complete within a fraction of a second.
jalf
"that a C/C++ compiler cannot do" Google profile based optimization c++
igouy
+7  A: 

http://shootout.alioth.debian.org/u64q/java.php - A detailed comparison.

Adeel Ansari
+1  A: 

Both java and c# (and objective-c) are not nearly as fast as native code can be. But that only matters if you have a problem where you are not engineering-time limited. Because you'll have the time to devise a better algorithm with a high level language.

So basically, if you're developing a device where you're going to build a million a year of, or that is battery powered, you don't use java or c# to build its core functionality. You might add a lisp interpreter to make customisation easy, though. Microsoft is not going to use c# for say the core of SQL server, where performance really matters. Visual Studio on the other hand, where MS can expect users to have high-end hardware, can be used as a showcase for slow but high productivity technology.

Please note that I currently do most of my programming in Pharo Smalltalk, which is a lot slower than java, c# or objective-c, and is not even one of the fastest Smalltalks. Productivity trumps performance.

Stephan Eggermont
There are plenty of places where the JIT can actually compile the java code into faster native code than native code precompiled by a normal C or C++ compiler. Typically the reason is that it has exact information on how the code is used, can do escape analysis etc. If your overall performance will be better or worse depends very much on the kind of application.
Fredrik
@Stephan Interesting, do you have a source for this?
Kristopher Ives
Are you saying that SQL Server can't be expected to run on "high-end hardware"?
jalf
In c++ you have much more possibilities to fine-tune the code. The end-result (with sufficient engineering time) will be faster in c++. JVM optimisation is getting better but not there yet.
Stephan Eggermont
No, SQL server is performance critical, while Visual Studio is not
Stephan Eggermont
@Stephan "not nearly as fast as" is that 2x more cpu, 5x more cpu, 10x more cpu?
igouy
Visual Studio certainly is performance critical, at least the compiler and linker parts.
JesperE
VS is certainly not performance critical. Nobody buys a different compiler because they have to wait a bit longer for the build. Otherwise we'd be all using Delphi. They might buy a slower one because it optimises better, though.
Stephan Eggermont
Real-life performance much more depends on the libraries and frameworks you use, and there you mostly don't have a language choice.
Stephan Eggermont
+1  A: 

To address each of your points:

  • The overhead of interpreting code is much higher than 10-15% (I'd guess at along 3x-5x or higher). In order to get down to 10-15% you have to use some form of machine-code compilation step (i.e. JIT). (Try running a JVM with JIT switched off, and you'll see the performance drop like a rock.)
  • Garbage collection does have a performance impact, but I'd say that everyone agrees that it is worth it. If you can afford the byte-code compilation/interpretation overhead, you can afford the gc overhead as well.
  • Socket programming is much easier in Java than in C/C++, if that's what you're asking. And performancewise, the socket I/O overhead dominates over the Java execution overhead.
  • Most modern JVMs have true threads, i.e. each Java thread is executed by a kernel thread, allowing Java threads to utilize modern multi-core CPUs.
JesperE
Seriously, where are you people getting these numbers?
Kristopher Ives
Experience? Educated guess based on several years implementing hardware simulators? I haven't personally made any benchmarks on interpreted JVM-code for a couple of years, though. If you have any hard data, feel free to come forward.
JesperE
Garbage collection usually has a *positive* performance impact. The amortized cost of allocation and garbage collecting an object is far lower than it'd be with manual memory management.
jalf
Did you really think the question was about "interpreting" rather than the typical Java JIT?
igouy
Interpretation is not the same as JIT. They are two distinct operational modes of the JVM, and mixing them up benefits no-one.
JesperE
IOW, if someone asks about "overhead using an interpreter", I assume that he refers to an interpreter and not a JIT.
JesperE
In this case I think your assumption is simply wrong - the question is really just the usual JVM vs native question.
igouy
That may be. But I try to avoid making assumptions about what the OP means.
JesperE
Didn't you already say - "I assume that he..." ;-)
igouy
Well, yes. Avoiding assumptions is impossible, given the imperfect medium of human language.
JesperE
What was your point then? If we're going to make assumptions anyway, why are yours more valid than those of igouy? ;)As I read the OP's question, he's interested in the performance cost of Java compared to something such as C++ which compiles directly to machine code. He doesn't care if Java is JIT'ed or interpreted, he simply wants to know how much it "costs" to use Java. Assuming a sensible JVM running with sensible settings.
jalf
My point? I've already conceeded that my assumption may be incorrect, and that the OP uses "interpretation" to refer to any technique used to execute JVM byte code. Not sure why you keep grinding the issue.
JesperE
A: 

There isn't an easy answer to this. Writing C style C++ is possible (even a good idea) but once you try to do inheritance in C things get ugly. So ignore C and go with Java -cs- C++ since they are closer to one another.

To get a real sense of it you would need to write tow relatively large applications in similar manner in both languages. If you do that then do you use the STL and the Java collection classes or do you write your own and port them between languages? If you use the native one then it depends on which implementation is faster where as if you use your own you are not testing the real speed of the application.

I'd say you would need to write the application as similar as possible but use the language specific libraries/idioms where it makes sense. C++ and Java code, while similar, have different ways of doing things - something that is easy in Java may be terribly hard in C++ and vice versa.

A modern GC implementation doesn't add that much overhead, and you can switch to a GC in C++ to do the comparison if you like :-)

There are some things that the Java runtime can do that is not generally done in C++ compilers, such as the ability to inline virtual methods.

For system type things Java typically resorts to making calls into C so there is overhead there (though JNI is faster than it used to be).

Threading depends on the implementation. Sun used to use "green threads; for Solaris, but that is long gone. As far as I know most (all?) modern VMs use native Threads.

In short I don't think there is a good metric on the % overhead for Java -vs- C++, and any that you find are likely to be micro benchmarks that do not represent the real world (unfortunately).

TofuBeer
+17  A: 

Java isn't an interpreted language, and hasn't been for several versions. The Java bytecode is JIT'ed on the fly. (Technically it still interprets some of the code, but anything that matters performance-wise gets JIT'ed)

As for performance, what on Earth gives you the crazy idea that "there is a baseline for overhead"? There isn't. There never was and never will be. Not betwee C++ and Java, and not between Python and Javascript, or any other two languages. There are things that your specific version of the JVM will do faster than your specific C++ compiler, and things that your specific C++ compiler will do better than your specific JVM.

So the "overhead" of your choice of language depends entirely on 1) what you want your code to do, and 2) how you write your code.

If you take a Java program and translate it to C++, the result will almost certainly run slower.

If you take a C++ program and translate it to Java, that too will also run slower.

Not because one language is "faster" than the other, but because the original program was written for one language, and was tailored to work well in that language. And any attempt to translate it to another language will lose this advantage. You end up with a C++-style Java program, which won't run efficiently on the JVM, or a Java-style C++ program, which will run terribly as well.

Neither language specification contains a clause that "and the result must be at least x% slower than language y". Both your C++ compiler and the JVM do their very best to make things go fast.

And then performance characteristics you're seeing today may change tomorrow. Languages don't have a speed.

But to answer your specific questions:

There must be some baseline for overhead when using an interpreter. Is there some general rule of thumb to remember? 10% 15%? I have read the occasional blog stating that Java code is nearly as fast as native code, but I that may have been biased.

As said above, it depends. For many common tasks, you typically won't see more than a few percents difference either way. For some use cases, you'll see a larger difference (going either way. Both languages have advantages when it comes to performance. There is some overhead associated with the JVM, but there are also huge optimization opportunities and not least the garbage collector)

Does the JVM garbage collector add significant overhead to runtime performance? I know Cocoa applications have begun to use a garbage collection model, and i agree that it makes programming a lot simpler, but at what cost?

Basically none. On average, a garbage collector is far faster than manual memory management, for many reasons:

  • on a managed heap, dynamic allocations can be done much faster
  • shared ownership can be handled with negligible amortized cost, where in a native language you'd have to use reference counting which is awfully expensive
  • in some cases, object destruction is vastly simplified as well (Most Java objects can be reclaimed just by GC'ing the memory block. In C++ destructors must always be executed, and nearly every object has one)

The main problem with a GC is that while on average a garbage collector performs better, you lose some control over when to take the performance cost. Manual memory management ensures your thread won't ever be halted while waiting for memory to be cleaned up. A garbage collector can, at almost any time, decide to pause the process and clean up memory. In almost all cases, this is fast enough to be no problem, but for vital real-time stuff, it is a problem.

(An additional problem is that you lose a bit of expressiveness. In C++, RAII is used to manage all sorts of resources. In Java, you can't use RAII. Instead the GC handles memory for you, and for all other resources, you're screwed, and have to do it yourself with lots of try/finally blocks. There is no reason why RAII couldn't be implemented in a GC'ed language, but it's not available in either Java or C#)

What is the overhead of making system calls from Java? For example creating a Socket object as opposed to the C socket API.

Roughly the same. Why would it be different? Of course, Java has to invoke the relevant OS services and APIs, so there is a tiny bit of overhead, but it is really nothing you're likely to notice.

Finally, I recall reading somewhere that the JVM implementation is single threaded. If this is true (which i am skeptical about), does that mean that Java threads really aren't true threads? Does a java thread, in general, correspond to an underlying kernel-provided thread? Does a Java application benefit in the same way a native application would from multiple cores / multiple cpu's?

Java can use multiple threads, yes. The JVM itself might be singlethreaded (in the sense that all the JVM services run on the same thread), I don't know about that. But your Java application can use as many threads as it likes, and they are mapped to OS threads and will use multiple cores.

jalf
"If you take a Java program and translate it to C++, the result will almost certainly run slower."What I was thinking here, in terms of a comparison, was to create some simple reference program that used the most basic features of each language (e.g. maybe a simple loop doing calculations on primitives). I assumed, due to the interpretation step, that some constant overhead must be present while executing similar mathematical expressions inside a JVM and not. Thank you for your very informative response.
darren
In such simple cases, every compiled or JIT'ed language will perform the same, pretty much. To get an accurate picture you need to use a more representative program. And then you run into the problem that it is tailored for one language, and any attempt to translate it to other languages will either put it at a disadvantage there, or completely transform the program so much that comparisons are no longer meaningful.
jalf
A point to note is "which JVM implementation" we are talking of here.
Jaywalker
True. The above applies to Sun's JVM, but I'd expect much the same from any decent quality JVM.
jalf
I suppose I could add that for interpreted languages, the overhead is usually around a factor 5x-10x. It is pretty hefty. Which is why Java uses JIT'ing.
jalf
"Java isn't an interpreted language" except when it is -Xint.C isn't an interpreted language except when it is http://root.cern.ch/drupal/content/cint
igouy
"On average, a garbage collector is far faster than manual memory management" is misleading. It is faster than naive memory allocation, and far slower than optimised manual memory management. The problem is that nobody has the time to do the optimisations, and programming in the optimised model is not much fun
Stephan Eggermont
@igouy: I think you know what I mean. "By default", when using the most common JVM without explicitly specifying non-default options, your Java program is JIT'ed. And likewise, when we talk about C++ programs and compilers, we *usually* assume our program is compiled to native code. Of course, the language doesn't specify that this should happen.
jalf
@Stephan: Misleading how? How does "optimized manual memory management" outperform a good GC? No matter how optimized, you still have to delete every allocated object. A GC does not do that. It only has to trace the graph of *live* objects every once in a while. It doesn't have to do *anything* when an individual object is freed. And the time taken to do a collection does not depend on the number of objects being collected. Before you accuse others of being "misleading", perhaps you should read up on how GC's actually work. They *are* ridiculously fast in terms of amortized cost per allocation
jalf
+1  A: 

A lot of people underestimate the performance of java. I was once curious about this as well and wrote a simple program in java and then an equivalent in c (not much more than doing some operation with a for loop and a massive array). I don't recall exact figures, but I do know that java beat out c when the c program was not compiled with any optimization flags (under gcc). As expected, c pulled ahead when I finally compiled it with aggressive optimization. To be honest, it wasn't a scientific experiment by any means, but it did give me a baseline of knowing just where java stood.

Of course, java probably falls further behind when you start doing things that require system calls. Though, I have seen 100MB/s read performance with disks and network with java programs running on modest hardware. Not sure what that says exactly, but it does indicate to me that it's good enough for pretty much anything I'll need it for.

As for threads, if your java program creates 2 threads, then you have 2 real threads.

exabytes18
Yeah i was thinking that I could write a simple comparative program, but then I thought that somebody on SO would have done something much more thorough, which lead me to the question. I have no performance complaints with Java; it is well suited to the type of projects I am interested in. I was just curious to get relative performance baseline for it.
darren
"To be honest, it wasn't a scientific experiment by any means, but it did give me a baseline of knowing just where java stood." No actually it didn't, it just gave you an excuse to generalize.
igouy
Umm, sure. I wasn't trying to say that my little trial was the be-all and end-all answer which definitively compared performance of every aspect of java to c. I was just giving my own personal experience with the issue (as I continue on about with the rest of the post).
exabytes18
+1  A: 

Important thing to note down is

Java byte code JIT compiled to much more optimized code specific to particular hardware

vs

C code compiled and optimized to general hardware so it cannot take advantage of features provided by specific hardware

Xinus
That's totally wrong. You can easily optimize C code for a particular bit of hardware. Worst case, you can have multiple versions of a function, and select one at runtime. Secondly, the typical JIT tries to generate native code quickly, rather than optimizing at any great depth. You could implement a JIT that compiles very slowly to optimal machine code, but in practice, it isn't done.
Mark Bessey
@Mark Bessey: Yes you can optimize c to N number of hardwares to produce N number of distributables as compared to Java which has only one distributable. As far as speed of compilation(to produce very optimized code) is concerned that is drawback of JIT. More information here http://en.wikipedia.org/wiki/Just-in-time_compilation
Xinus
Sorry about opening with "That's totally wrong" - I was in a bad mood. But in any case, I have seen / worked on enough VMs at this point that I feel comfortable saying that the "compile quickly" side of things is always weighted much more heavily than attempting to produce optimal code.As someone else pointed out earlier, you can use the same sorts of profile-driven optimizations that Java is theoretically capable of in C++ if you have a modern compiler.In the case of Java specifically, the JVM bytecode is very non-optimal for translation to machine code.
Mark Bessey
A: 

As your objective is very modest "I am hoping to get a sense of the performance hit..." you should be able to fulfill most of it by examining the programs and measurements shown in the Computer Language Benchmarks Game.

As you know both Java and C++

But you do have to think about whether measurements of tiny programs can plausibly indicate the likely performance of your application.

igouy