views:

1276

answers:

7

What exactly makes the JVM (in particular, Sun's implementation) slow to get running compared to other runtimes like CPython? My impression was that it mainly has to do with a boatload of libraries getting loaded whether they're needed or not, but that seems like something that shouldn't take 10 years to fix.

Come to think of it, how does the JVM start time compare to the CLR on Windows? How about Mono's CLR?

UPDATE: I'm particularly concerned with the use case of small utilities chained together as is common in Unix. Is Java now suitable for this style? Whatever startup overhead Java incurs, does it add up for every Java process, or does the overhead only really manifest for the first process?

+1  A: 

There are a number of reasons:

  • lots of jars to load
  • verification (making sure code doesn't do evil things)
  • JIT (just in time compilation) overhead

I'm not sure about the CLR, but I think it is often faster because it caches a native version of assemblies for next time (so it doesn't need to JIT). CPython starts faster because it is an interpreter, and IIRC, doesn't do JIT.

Zifre
Classes compiled with -target 1.6 are much faster to verify (the Harmony verifier is also much faster).
Tom Hawtin - tackline
CPython compiles Python files into bytecode before executing them, like Java.
Bastien Léonard
@Bastien: there is a big difference between bytecode and machine code. Made up bytecode is *much* easier than generating machine code.
Zifre
+9  A: 

Here is what Wikipedia has to say on the issue (with some references).

It appears that most of the time is taken just loading data (classes) from disk (i.e. startup time is I/O bound).

Naaff
Thanks, the Sun article you link has some good detail.
Jegschemesch
One clarification: it's not just (or perhaps so much) disk I/O, but finding classes from within jars (wars, ears), which are zip archives. Extraction takes CPU, but especially if they are compressed.
StaxMan
CPU time to decompress is minimal. Indeed, stackoverflow itself uses compression as a performance optimisation, and compression is massively more expensive than decompressions.
Tom Hawtin - tackline
Also rt.jar isn't important for the client HotSpot, which defaults to using the half-shared classes.jsa file.
Tom Hawtin - tackline
@Tom: CPU time to decompress a 2kb class file: minimal. CPU time to decompress a 30mb class file to read a 2kb class file: significant.
Seun Osewa
@Seun a jar file uses zip compression, there is no need to decompress the whole archive for one file, each entry (file) is compressed independent of other entries.
josefx
I don't get why they don't keep a persistent (i.e. saved across reboots) cache of commonly-used classes, and just check the timestamps of the source jars when a JVM launches. It seems like they could load that at startup and catch 80% of the classes that end up being used by most apps. Well, I guess they know what they're doing.
intuited
+4  A: 

Running a trivial Java app with the 1.6 (Java 6) client JVM seems instantaneous on my machine. Sun has attempted to tune the client JVM for faster startup (and the client JVM is the default), so if you don't need lots of extra jar files, then startup should be speedy.

jdigital
Whatever condition it is, jvm is still the vm that has slowest startup. You may find it fast enough on certain grade of machine and for certain size of application. But still its startup time is comparably inarguably slow.
Sake
+2  A: 

If you are using Sun's HotSpot for x86_64 (64bit compiled), note that the current implementation only works in server mode, that is, it precompiles every class it loads with full optimization, whereas the 32bit version also supports client mode, which generally postpones optimization and optimizes the most CPU-intensive parts only, but has faster start-up times.

See for instance:

That being said, at least on my machine (Linux x86_64 with 64bit kernel), the 32bit HotSpot version supports both client and server mode (via the -client and -server flags), but defaults to server mode, while the 64bit version only supports server mode.

Paggas
Are you sure your info is correct: I haven't heard of a (modern) Sun JVM that would done full JIT'ing for all classes? And server mode definitely does NOT do this on most systems. Difference between client and server modes has more to with different config settings, like threshold used for JIT inlining.
StaxMan
Server HotSpot doesn't do compilation until 10,000 iterations. Client does it at 1,500. However, the server is implemented somewhat differently (two instead of one intermediate representations, IIRC).
Tom Hawtin - tackline
A: 

In addition to things already mentioned (loading classes, esp. from compressed JARs); running in interpreted mode before HotSpot compiles commonly-used bytecode; and HotSpot compilation overhead, there is also quite a bit of one-time initialization done by JDK classes themselves. Many optimizations are done in favor of longer-running systems where startup speed is less of a concern.

And as to unix style pipelining: you certainly do NOT want to start and re-start JVM multiple times. That is not going to be efficient. Rather chaining of tools should happen within JVM. This can not be easily intermixed with non-Java Unix tools, except by starting such tools from within JVM.

StaxMan
A: 

Its really depends on what you are doing during the start up. If you run Hello World application it take 0.15 seconds on my machine.

However, java better suited to running as a client or a server/service which means the startup time isn't as important as the connection time (about 0.025 ms) or the round trip time response time (<< 0.001 ms)

Peter Lawrey
A: 

All VMs with a rich type system such as Java or CLR will not be instanteous when compared to less rich systems such as those found in C or C++. This is largely because a lot is happening in the VM, a lot of classes get initialized and are required by a running system. Snapshots of an initialized system do help but it still costs to load that image back into memory etc.

A simple hello world styled one liner class with a main still requires a lot to be loaded and initialized. Verifying the class requires a lot of dependency checking and validation all which cost time and many CPU instructions to be executed. On the other hand a C program will not do any of these and will amount of a few instructions and then invoke the printer function.

mP