How does Java efficiently search jar files for classes?

Tom mentions in a comment that jar files have an index at the end. This is worth noting. You mention that the total size of your hypothetical jar files is 500MB, but actually the size is irrelevant. The classloader never has to look through a whole jar file byte by byte. Instead, it seeks to the end of the file, then reads back through the central directory. This is a series of records containing the names of the files in the zip, various metadata, and the offset of that file in the zip file. Here's a brief summary of the PKZIP file format.

So, to search 500 jar files, the classloader would open each one (until a match), and look at the end of it to see if the file is present. It may or may not cache this information, to speed up future searches (things can get really slow if it doesn't, and there are a lot of entries in the path). This whole process is proportional to the total number of files in all the jars, rather than the total size of all the jars. Parsing a JAR index is still proportional to the total number of files in all the jars, but it's a smaller constant since there's less filesystem activity.

Do you believe that the use of indexes is common? I think still linear behaviour is default, you need to add an index, and many folks don't.

djna 2010-01-07 14:26:13

All zips have an index at the end of the file anyway, so it doesn't make much difference.

Tom Hawtin - tackline 2010-01-07 14:28:04

hmm, It seems ANT defaults http://ant.apache.org/manual/CoreTasks/jar.html to false for index generation in JAR files. I wonder what IDE's do ?

Hassan Syed 2010-01-07 14:29:41

@Hassan, @djna, Out of 342 jars in my maven repo, only 2 had INDEX.LIST! It's not common to generate those unfortunately.

notnoop 2010-01-07 14:29:59

Generating an index is application specific as you can't predict which JAR will be on the classpath. Using index is absolutely not common and is absolutely not the norm. According to the link your provided yourself, the idea was initially to help applets but, who does use them nowadays?

Pascal Thivent 2010-01-07 14:32:19

Applications which have a huge common source base -- picture an extensible engine -- such as an IDE or a web-server; such applications have # a huge stable set of classes in stable JAR files which will be pulled in every time the application is started. These JAR files will therefore benefit from indexing. On the other hand the plugin's might not.

Hassan Syed 2010-01-07 14:44:00

In Java 7, they are trying to redeem that by supporting a module system, similar to OSGi

notnoop 2010-01-07 14:17:35

(-1) that information is outdated.

Hassan Syed 2010-01-07 14:23:10

@Hassan. Indexes are optional, default is indeed linear: The new class loading scheme is totally backward compatible with applications developed on top of the current extension mechanism. When the class loader loads the first jar file and an INDEX.LIST file is found in the META-INF directory, it would construct the index hash table and use the new loading scheme for the extension. Otherwise, the class loader will simply use the original linear search algorithm.

djna 2010-01-07 14:23:19

@djna allready updated my comment =D.

Hassan Syed 2010-01-07 14:25:22

@Hassan This information is **not** outdated, this is the default behavior and I've almost never seen JAR index. Maybe reconsider your vote and your understanding of this.

Pascal Thivent 2010-01-07 14:28:47

@Pascal, your answer reads like a totality -- you preclude the fact that indexing might exist. You should have posted the entire picture -- since the OP is worried about efficiency .

Hassan Syed 2010-01-07 14:33:57

@Hassan Sorry but I'm describing the **default** behavior which seems to be what the OP is asking for. What you are describing is the exception.

Pascal Thivent 2010-01-07 14:37:09

congratulations on all the upvotes .... :D

Hassan Syed 2010-01-07 15:00:13

ansaurus

tags:

views:

answers:

How does Java efficiently search jar files for classes?

related questions