views:

1059

answers:

6

We've trying to separate a big code base into logical modules. I would like some recommendations for tools as well as whatever experiences you might have had with this sort of thing.

The application consists of a server WAR and several rich-clients distributed in JARs. The trouble is that it's all in one big, hairy code base, one source tree of > 2k files war. Each JAR has a dedicated class with a main method, but the tangle of dependencies ensnares quickly. It's not all that bad, good practices were followed consistently and there are components with specific tasks. It just needs some improvement to help our team scale as it grows.

The modules will each be in a maven project, built by a parent POM. The process has already started on moving each JAR/WAR into it's own project, but it's obvious that this will only scratch the surface: a few classes in each app JAR and a mammoth "legacy" project with everything else. Also, there are already some unit and integration tests.

Anyway, I'm interesting in tools, techniques, and general advice to breaking up an overly large and entangled code base into something more manageable. Free/open source is preferred.

+3  A: 

Two advices: The first thing you need is Test suites. The second advice is to work in small steps.

If you have a strong test suite already then you're in a good position. Otherwise, I would some good high level tests (aka: system tests).

The main advantage of high level tests is that a relatively small amount of tests can get you great coverage. They will not help you in pin-pointing a bug, but you won't really need that: if you work in small steps and you make sure to run the tests after each change you'll be able to quickly detect (accidentally introduced) bugs: the root of the bug is in the small portion of the code has changed since the last time you ran the tests.

Itay
+1  A: 

I would start with the various tasks that you need to accomplish.

I was faced with a similar task recently, given a 15 year old code base that had been made by a series of developers who didn't have any communication with one another (one worked on the project, left, then another got hired, etc, with no crosstalk). The result is a total mishmash of very different styles and quality.

To make it work, we've had to isolate the necessary functionality, distinct from the decorative fluff to make it all work. For instance, there's a lot of different string classes in there, and one person spent what must have been a great deal of time making a 2k line conversion between COleDateTime to const char* and back again; that was fluff, code to solve a task ancillary to the main goal (getting things into and out of a database).

What we ended up having to do was identify a large goal that this code accomplished, and then writing the base logic for that. When there was a task we needed to accomplish that we know had been done before, we found it and wrapped it in library calls, so that it could exist on its own. One code chunk, for instance, activates a USB device driver to create an image; that code is untouched by this current project, but called when necessary via library calls. Another code chunk works the security dongle, and still another queries remote servers for data. That's all necessary code that can be encapsulated. The drawing code, though, was built over 15 years and such an edifice to insanity that a rewrite in OpenGL over the course of a month was a better use of time than to try to figure out what someone else had done and then how to add to it.

I'm being a bit handwavy here, because our project was MFC C++ to .NET C#, but the basic principles apply:

  1. find the major goal
  2. identify all the little goals that make the major goal possible
  3. Isolate the already encapsulated portions of code, if any, to be used as library calls
  4. figure out the logic to piece it all together.

I hope that helps...

mmr
+5  A: 

Have a look a Structure 101. It is awesome for visualizing dependencies, and showing the dependencies to break on your way to a cleaner structure.

Jens Schauder
+1  A: 

We recently have accomplished a similar task, i.e. a project that consisted of > 1k source files with two main classes that had to be split up. We ended up with four separate projects, one for the base utility classes, one for the client database stuff, one for the server (the project is a rmi-server-client application), and one for the client gui stuff. Our project had to be separated because other applications were using the client as a command line only and if you used any of the gui classes by accident you were experiencing headless exceptions which only occurred when starting on the headless deployment server.

Some things to keep in mind from our experience:

  • Use an entire sprint for separating the projects (don't let other tasks interfere with the split up for you will need the the whole time of a sprint)
  • Use version control
  • Write unit tests before you move any functionality somewhere else
  • Use a continuous integration system (doesn't matter if home grown or out of the box)
  • Minimize the number of files in the current changeset (you will save yourself a lot of work when you have to undo some changes)
  • Use a dependency analysis tool all the way before moving classes (we have made good experiences with DependencyFinder)
  • Take the time to restructure the packages into reasonable per project package sets
  • Don't fear to change interfaces but have all dependent projects in the workspace so that you get all the compilation errors
dhiller
+1  A: 

To continue Itay's answer, I suggest reading Michael Feathers' "Working Effectively With Legacy Code"(pdf). He also recommends every step to be backed by tests. There is also A book-length version.

Yuval F
A: 

Maven allows you to setup small projects as children of a larger one. If you want to extract a portion of your project as a separate library for other projects, then maven lets you do that as well.

Having said that, you definitely need to document your tasks, what each smaller project will accomplish, and then (as has been stated here multiple times) test, test, test. You need tests which work through the whole project, then have tests that work with the individual portions of the project which will wind up as child projects.

When you start to pull out functionality, you need additional tests to make sure that your functionality is consistent, and that you can mock input into your various child projects.

Spencer K