views:

202

answers:

3

Suppose I have 500 jar files linked to my program totaling over 500 MB (size of all the jars, not each one) and my program makes a call to a class located in one of them. How does Java search through jars for a class, and what is the efficiency of this? O(n)? O(log(n))?

+5  A: 

By default it used to be linear; however, since JDK 1.3 a JAR index can be embedded in the first JAR file of an application.

This means that if the index is embedded into the JAR file the class loader can efficiently find all the classes distributed over multiple JAR files belonging to the application.

link to SUN Resource on JAR Indexing. Note: non-class resources don't seem to be covered.

Hassan Syed
Do you believe that the use of indexes is common? I think still linear behaviour is default, you need to add an index, and many folks don't.
djna
All zips have an index at the end of the file anyway, so it doesn't make much difference.
Tom Hawtin - tackline
hmm, It seems ANT defaults http://ant.apache.org/manual/CoreTasks/jar.html to false for index generation in JAR files. I wonder what IDE's do ?
Hassan Syed
@Hassan, @djna, Out of 342 jars in my maven repo, only 2 had INDEX.LIST! It's not common to generate those unfortunately.
notnoop
Generating an index is application specific as you can't predict which JAR will be on the classpath. Using index is absolutely not common and is absolutely not the norm. According to the link your provided yourself, the idea was initially to help applets but, who does use them nowadays?
Pascal Thivent
Applications which have a huge common source base -- picture an extensible engine -- such as an IDE or a web-server; such applications have # a huge stable set of classes in stable JAR files which will be pulled in every time the application is started. These JAR files will therefore benefit from indexing. On the other hand the plugin's might not.
Hassan Syed
+6  A: 

Java looks in the internal directory structure of the jar for an exact match on the fully qualified name. It looks; it does not search. If you have 500 jar files on the classpath, Java will look in them one by one in the specified order until it finds a match. If the jar containing a given class is the last one, Java will look in the 500 jar files. So I guess it's O(n).

UPDATE: The behavior described above is the default behavior. However, as Hassan pointed out, this can be optimized by providing an JarIndex in the root jar file allowing the classloader to find the proper jar file with a simple lookup on a package name.

Pascal Thivent
In Java 7, they are trying to redeem that by supporting a module system, similar to OSGi
notnoop
(-1) that information is outdated.
Hassan Syed
@Hassan. Indexes are optional, default is indeed linear: The new class loading scheme is totally backward compatible with applications developed on top of the current extension mechanism. When the class loader loads the first jar file and an INDEX.LIST file is found in the META-INF directory, it would construct the index hash table and use the new loading scheme for the extension. Otherwise, the class loader will simply use the original linear search algorithm.
djna
@djna allready updated my comment =D.
Hassan Syed
@Hassan This information is **not** outdated, this is the default behavior and I've almost never seen JAR index. Maybe reconsider your vote and your understanding of this.
Pascal Thivent
@Pascal, your answer reads like a totality -- you preclude the fact that indexing might exist. You should have posted the entire picture -- since the OP is worried about efficiency .
Hassan Syed
@Hassan Sorry but I'm describing the **default** behavior which seems to be what the OP is asking for. What you are describing is the exception.
Pascal Thivent
congratulations on all the upvotes .... :D
Hassan Syed
A: 

Tom mentions in a comment that jar files have an index at the end. This is worth noting. You mention that the total size of your hypothetical jar files is 500MB, but actually the size is irrelevant. The classloader never has to look through a whole jar file byte by byte. Instead, it seeks to the end of the file, then reads back through the central directory. This is a series of records containing the names of the files in the zip, various metadata, and the offset of that file in the zip file. Here's a brief summary of the PKZIP file format.

So, to search 500 jar files, the classloader would open each one (until a match), and look at the end of it to see if the file is present. It may or may not cache this information, to speed up future searches (things can get really slow if it doesn't, and there are a lot of entries in the path). This whole process is proportional to the total number of files in all the jars, rather than the total size of all the jars. Parsing a JAR index is still proportional to the total number of files in all the jars, but it's a smaller constant since there's less filesystem activity.

Steve Jessop