views:

825

answers:

8

We have a large (>500,000 LOC) Java system that depends on 40-50 OSS packages. The system is built with Ant, and dependency management is handled manually at present. I'm investigating Ivy and/or Maven to automate dependencies. We looked at Maven as a build automation system last year and rejected it because it would require totally restructuring our system to match Maven's architecture. Now I'm looking to automate just the dependency management tasks.

I've done some experimentation with Ivy but have run into problems. For example, when I specify ActiveMQ as a dependency, and tell Ivy to use the POMs in the Maven repository for dependency specification, Ivy retrieves a bunch of packages (Jetty, Derby and Geronimo for instance) that I know aren't needed to just use ActiveMQ.

If I set usepoms="false" in ivysettings.xml it fetches only activemq.jar, but that seems to defeat the purpose of Ivy and relegates it to a simple jar-fetcher with manually-built dependency specifications.

There's a bigger issue here, what used to be called "DLL Hell" in Windows. In some cases, two direct first-level dependencies will point to different versions of the same transitive dependency (for instance log4j.jar). Only one log4j.jar can be in the classpath, so dependency resolution involves manually determining which version is compatible with all of its clients in our system.

I guess it all boils down to the quality of each package's dependency specification (the POM). In the case of ActiveMQ, there are no scope declarations, so any reference to ActiveMQ will download all of its dependencies unless we manually exclude the ones we know we don't want.

In the case of log4j, automatic dependency resolution would require that all of log4j's clients (other packages that depend on log4j) validate against all prior versions of log4j and provide a range (or list) of compatible log4j versions in the POM. That's probably too much to ask.

Is this the current state of affairs, or am I missing something?

A: 

"Is this the current state of affairs"

Yes.

This is the open-source tradeoff.

A closed-source framework (i.e., .Net) will solve all of this for you.

An open source solution means you have to solve it (and resolve it) all the time.

You might be able to find some pre-built configurations and pay someone to keep those up-to-date. For example, you could elect to use Red Hat Enterprise Linux. If you stick to precisely what they support (and nothing more) then configuration is solved.

Odds are good, however, that no packaged configuration meets your requirements.

S.Lott
I strongly disagree with your closed-source is better, open-source is investing effort constantly. Do you have a closed-source alternative to this problem? Except .NET.
Robert Munteanu
I disagree that this is the state of affairs. The dependencies in question are optional and managed correctly in Maven (see my answer for more details). It may be that Ivy can't deal with the optional property, but it still has a means to deal with those dependencies. So what is your basis for making this claim?
Rich Seller
@Robert Munteanu: I did not say that a closed-source solution was better in a vague, blanket way. It provides pre-integrated components. That's it's *only* advantage. And most of the time, that's actually a liability because the pre-integrated stuff evolves slowly and lags behind the state of the art.
S.Lott
@Rich Seller: Every week or two, I have to resolve version number and transitive dependency issues among our open source component stack. Read the log4j issue described in this question. That's a standard, common, universal problem with a large collection of open-source components.
S.Lott
I did read that section and responded to it in my answer. It is a common problem because people don't use the tools available to resolve those issues.
Rich Seller
This has nothing to do with "open-source" vs. "closed-source".
Nate
Not sure what this has to do with the question. "DLL Hell" originated in the closed-source Windows environment.
Jim Garrison
@Jim Garrison: Microsoft's incompetence isn't a good example. Oracle, IBM, HP, Sun and numerous other vendors are able to provide closed-source solutions without the DLL-hell problem of Microsoft's stuff.
S.Lott
+2  A: 

I think this is indeed the current state of affairs. OSGi and the proposed new packaging system for java 1.7 (has discussion on that one come to a conclusion already?) are attempts at fixing at least the depending-on-different-versions-of-a-library issue, but I don't think they will be able to fix your issue right now.

Simon Groenewolt
do you have any links with more info on "the proposed new packaging system for java 1.7"?
matt b
I was thinking of JSR 277: http://jcp.org/en/jsr/detail?id=277 -- google for lots of opinions ;-) ... I _thought_ this was meant to go into 1.7 but apparently it is not going to happen.
Simon Groenewolt
A: 

"Is this the current state of affairs?"

Not with OSGi. You may want to look at OSGi containers and bundles. A bundle is like a jar file, but it supports meta data detailing its version, and the versions of related bundles it requires (by putting attributes in the META-INF file). So your log4j bundle will indicate its version, and dependent bundles will detail what versions of log4j they require.

Furthermore, a non-hierarchical classloader mechanism is supported, such that you can load multiple versions of log4j, and different bundles can specify, and bind to, those different versions.

Javaworld has a very good introduction here.

Brian Agnew
+4  A: 

Thats pretty much it. The maven dependency system (which Ivy more or less follows) leaves it up to the individual projects to do a good job of adding the necessary meta data for their dependencies. Most don't.

If you go that route, expect to spend time setting up exclusions.

To the posters recommending OSGi, the OP said that he is not willing to re-architect his build system for Maven, I wouldn't think he would want to re-architect his application to be OSGi compliant. Furthermore, a lot of OSS projects that are OSGi compliant (and there are not as many as you'd hope) have as bad or worse meta data than in Maven

Kevin
I think saying "Most don't" is a gross exaggeration, can you actually cite any major projects that are poorly configured? The vast majority of Maven projects I have used are well behaved. It is more the case that most users don't understand all the nuances of the system and get tripped up by them. It can still be a problem if users don't RTFM, but that's always the case with complex systems.
Rich Seller
Ivy doesn't "mostly follow" Maven dependency system, it merely has an adapter system for it. True, POMs imported from Maven suck hard but if you take the time to manually redo the dependency settings for Ivy, you actually end up with something a lot better Maven could ever provide.
Esko
@Rich Other than ActiveMQ that the OP noted? I've had issues with CXF not properly declaring dependencies as "optional" or otherwise unneeded.
Kevin
In my mind, 1 does not count as "most"
Rich Seller
+6  A: 

You're absolutely right in saying that

I guess it all boils down to the quality of each package's dependency specification (the POM).

The only thing I would add is to view the POM, or any other form of metadata, as a starting point. It's quite useful that e.g. ActiveMQ provides all the dependencies for you, but it's up to you to choose if it actually does suit your project.

After all, even taking the log4j version into account, would you have an external dependencies pick the version or choose the version you know works for you?


As for how you can choose to tailor dependencies, here's what you can do with Ivy:

Unneeded packages

Ivy retrieves a bunch of packages (Jetty, Derby and Geronimo for instance) that I know aren't needed to just use ActiveMQ.

This usually happens because of poor modularity in the application. Some part of the application needs Jetty for instance, but you end up with this transitive dependency even if you don't use it.

You probably want to look into the ivy exclude mechanism:

<dependency name="A" rev="1.0">
  <exclude module="B"/>
</dependency>

Dependency versions

Only one log4j.jar can be in the classpath, so dependency resolution involves manually determining which version is compatible with all of its clients in our system.

Perhaps I'm misreading this, but there is no manual element in Ivy's conflict resolution. There is a list of default conflict managers:

  • all: this conflicts manager resolve conflicts by selecting all revisions. Also called NoConflictManager, it does evict any module.
  • latest-time: this conflict manager selects only the 'latest' revision, latest being defined as the latest in time. Note that latest in time is costly to compute, so prefer latest-revision if you can.
  • latest-revision: this conflict manager selects only the 'latest' revision, latest being defined by a string comparison of revisions.
  • latest-compatible: this conflict manager selects the latest version in the conflicts which can result in a compatible set of dependencies. This means that in the end this conflict manager does not allow any conflict (like the strict conflict manager), except that it follows a best effort strategy to try to find a set of compatible modules (according to the version constraints);
  • strict: this conflict manager throws an exception (i.e. causes a build failure) whenever a conflict is found.

If needed, you can provide your own conflict manager.

Robert Munteanu
+3  A: 

Of the dependencies you list, the following are defined as optional in the activemq-core pom (also see the relevant section from the Maven book).

  • org.apache.derby:derby
  • org.apache.geronimo.specs:geronimo-jta_1.0.1B_spec

I didn't see a direct dependency on Jetty, so it may be transitively included from one of the optional dependencies.

In Maven optional dependencies are handled automatically. Essentially any dependency that is declared optional must be redeclared in your pom for it to be used. From the documentation linked above:

Optional dependencies are used when it's not really possible (for whatever reason) to split a project up into sub-modules. The idea is that some of the dependencies are only used for certain features in the project, and will not be needed if that feature isn't used. Ideally, such a feature would be split into a sub-module that depended on the core functionality project...this new subproject would have only non-optional dependencies, since you'd need them all if you decided to use the subproject's functionality.

However, since the project cannot be split up (again, for whatever reason), these dependencies are declared optional. If a user wants to use functionality related to an optional dependency, they will have to redeclare that optional dependency in their own project. This is not the most clear way to handle this situation, but then again both optional dependencies and dependency exclusions are stop-gap solutions.

I'm not sure if you can configure Ivy to ignore the optional dependencies, but you can configure it to exclude dependencies. For example:

<dependency name="A" rev="1.0">
  <exclude module="B"/>
</dependency>

This isn't entirely satisfactory I know. It may be that Ivy does support optional dependencies (I will have a further look and update if I find anything), but the exclusions mechanism at least allows you to manage them.


Regarding the last part of your question. Maven will resolve the dependency versions for log4j and if the versions are compatible it will automatically select the 'nearest' of the listed versions.

From the Introduction to the Dependency Mechanism:

  • Dependency mediation - this determines what version of a dependency will be used when multiple versions of an artifact are encountered. Currently, Maven 2.0 only supports using the "nearest definition" which means that it will use the version of the closest dependency to your project in the tree of dependencies. You can always guarantee a version by declaring it explicitly in your project's POM. Note that if two dependency versions are at the same depth in the dependency tree, until Maven 2.0.8 it was not defined which one would win, but since Maven 2.0.9 it's the order in the declaration that counts: the first declaration wins.

    • "nearest definition" means that the version used will be the closest one to your project in the tree of dependencies, eg. if dependencies for A, B, and C are defined as A -> B -> C -> D 2.0 and A -> E -> D 1.0, then D 1.0 will be used when building A because the path from A to D through E is shorter. You could explicitly add a dependency to D 2.0 in A to force the use of D 2.0

Where the versions are not compatible you have a bigger problem than dependency resolution. I believe Ivy operates on a similar model but I am no expert.

Rich Seller
+2  A: 

I'm currently using Ivy to manage more than 120 OSS and proprietary libraries for several projects (some standalone, some dependents). Back in 2005 (when Ivy was still from Jayasoft) I decided (or had to) to write the ivy.xml files for each integrated package.

The biggest advantage is that I have full control over the various configurations. This may sound overkill for some but our build system has been working reliably for over 4 years now and adding a new library is typically a 5 minutes job.

Vladimir
A: 

There is the whole idea of dependency injection - which would invariably lead to the program needing to be restructured. I have been hearing some noise about GUICE being good in this regard. In deployment perspective I have had reasonable success with deploying only the .jar we built with dependency .jars being fetched from original projects via jnlp. The build system behind this involved manual tracking of new versions of dependencies and updating in the build system.

whatnick