tags:

views:

262

answers:

7

What is a good strategy for dealing with generated code? One of our projects uses Apache CXF's wsdl2java tool to generate JAX-WS stubs for a set of wsdls. How should this code be dealt with?

There are two options that I can see:

  1. Generate the stubs once and store them in version control. With this, you don't have to deal with IDE classpath issues since you have the source files right there in your tree (or nearby). However, you have a lot of extra clutter in version control in addition to the temptation for someone to monkey with the generated code

  2. The stubs are generated every time at build time. This reverses the pros/cons for #1 in that the developer now has to deal run the build script and add the resulting jars to his/her classpath.

We went with #2 because the annoyance of classpath related issues seemed to outweigh the problems detailed in #1.

What are other people doing? Does anyone have any recommendations to streamline this process?

A: 

I think you'll find that the J2EE/EJB 2.x crowd had to deal with a similar issue with XDoclet. In my experience, I've seen it done both ways -- people storing generated code in version control and people who generate the code during the build.

As long as you have a good system for testing, I think #1 is preferable. If you have really good tooling that can handle style #2 (like Eclipse's Xdoclet functionality) then go with that. Watch out though -- #2 can often fill the permanent generation of a JVM if you build and rebuild for a long time, and restarting your IDE/JVM that often is a pain.

Martin
How did XDoclet solve the issue?
Kevin
Tooling that would run every time you made a change to the input files.
Martin
(or you could run it by hand :)
Martin
+8  A: 

My attitude is that generated code should practically never be stored in version control. There has to be a compelling reason to do it. I typically create an ant task "build-for-eclipse" that builds all generated code. I run that, refresh the directory into which generated code is created, and voilà, I am fit to go.

The goal is that you have a "one-button" trivial task that any developer can do so they'll have all the source -- generated and not -- in their IDE, but that no output of anything is stored in source control. If it's the output of a generator, then by definition it's not source. :-)

This should safely meet everyone's needs.

Eddie
+1 Storing generated code in source control feels similar to storing binaries next to the source (I have a coworker that advocates this). This is of course assuming the code is generated from something that is meant to change, and isn't a one off process.
Mark Roddy
How do you deal with 'permgen full' errors that come from regenerating new classes every time you rebuild inside your tooling? Or do you not use any IDE or tools that stay running across builds?
Martin
@Martin: CruiseControl has no problem creating generated classes for each build, and running permanently. I don't keep my IDE up for days and days. I normally close down my IDE at the end of a day, since it rememers where I left off.
Eddie
+2  A: 

I've tried it both ways, and settled on not storing generated code as a general rule. It can be a problem when there are slight, trivial differences, and it looks like there's a changed in revision control, and there really is nothing there of importance. Also, with #1, you can end up putting a lot of junk in your repositories, particularly in situations where the generated code is binary. Most repos don't store diffs of binary code, but complete copies.

Even in situations like yours, where the generated stuff is text, I tend not to store it, unless I absolutely must make a code change to it to get it to work.

Don Branson
+2  A: 

I prefer option #3, which has the pros of 1 & 2 but not the cons: never commit generated files into source control, but do create a fully-automated and portable build process (one shell command runs it all on every workstation).

There is lots of discussion elsewhere on SO (and the 'Net) about both aspects. Suffice it to say that: source control is for SOURCE, not generated code or binaries, and that such source includes the scripts that automate a repeatable build.

Your real problem is depending on an IDE-based build process, which will inevitably hurt you. Let the developers configure their IDE for builds, but don't bet the farm on it and don't let it into your source control system.

Best wishes.

Rob Williams
We don't depend on the IDE for building. My comments about the IDE were related to the developer needing to have the generated jars on his classpath
Kevin
Excellent, Kevin--then you just need to set the IDE classpath to reference the generated JARs and have the build update them when appropriate. It can be helpful to update the JARs only when the generated code has to change, using the relevant Ant task (uptodate?).
Rob Williams
A: 

Number 2 is any-day better. Also with #2, you could leverage the improvements made to the code generator (Apache CXF in your case). You don't have to re-generate and checkin, everytime you start using a new CXF version. And yes, have a single click build system which does everything :-)

A: 

We use Eclipse as main IDE/tool. We define new Java project for each generated code. For example, if im working in a web project with hibernate and Axis Web Services. We have this structure in our workspace:

projectWeb: This is main project, generally a dynamic web project. All coders are working here :-)

projectORM: Code generated with hibernate tools.

projectWS: Code generated with Java2WDSL.

projectWSClient: Code generated with WDSL2Java.

Each project is under revision control (SVN). We use Maven 2 as depencendy/build tool and binaries from generated code are saved as jars into our Maven 2 repository. Before this, one person (or more) of the team is the responsible for dealing with generated code and test it after every generation (for example when changes on the model occurs).

Regards

SourceRebels
A: 

Most of the time, I would go for option #2. The reasons are pretty obvious and I see already enough support for this choice from others.

There is an exception in my book. If most/all of the following apply:
(as in, "mostly", you know -- it's a thinking criteria, not something written in stone...)

  1. The stub code takes a long time to regenerate (*)
  2. There is no effective way to determine in advance if the source has change (a-la MAKE) so you have to rebuild the stubs every time.
  3. The stub code is expected to change very rarely (because the source from which the stubs are generated changes very rarely)
  4. Any changes in the stub code typically require manual changes on the rest of the program anyway

I will usually go for scripting out the generation step (and document those in Source Control as well) but only doing the proxy generation by hand.

Typical examples of the above are database ORM classes and Web Services proxy classes.

*( How much is "too long"? It depends; in highly interactive envioronments, one minute could be "too long". As I said, it's a criteria for you to think about. In real programming, as in real life, you have to pick your evils).

Euro Micelli