views:

250

answers:

9

I'm going to refactor certain parts in a huge code base (18000+ Java classes). Goal is to be able to extract lower layers as independent libraries to be reused in other projects that currently use duplicate of this code base. Especially one part is of interest to be refactored into a framework independent of business logic. Ultimately I would like the code to have a clean architectural layering.

I've looked at the code with a tool called Structure 101 for java and found lots (!) of architectural layering issues where lower layers are referencing upper layers.

I don't want to simply start messing with the code but try to come up with a reasonable strategy to go about this problem. What things should I keep in mind?

I'm thinking about at least taking small steps. I'm also thinking about have unit tests in place, but that requires creating them, since there are none.

Any thoughts on this?

+2  A: 

On the top of my mind:

  • identify functional domains, which will facilitate the definition process of applications within that huge code base.
  • in turn, identify dependencies between those applications: those at the bottom (which are used by all the others) are typically technical frameworks or libraries.

  • create scenario testing (and not unit-testing, too much "localized" at this stage) to identify significant runtime processes and their outputs. Scenario-testing is more concerned with integration and can be used for non-regression testing as well.

  • prepare a current production environment and qualify the current bugs, because parallel runs will be needed when you begin refactoring (to make sure you are still keeping the same features working), and you do not want your parallel runs to be 100% compatible (because that would mean you have successfully reproduced the bugs!)

  • make sure to create appropriate merge workflow to manage different branches representing different (and potentially parallel) refactoring efforts.

VonC
+1  A: 

If you're going to be extracting groups of classes and turning them into independent libraries, decide on the members of a group and start turning them into a cohesive whole, limiting their interaction with the outside world. Reduce dependencies as much as possible. When you're done, pull out that group, turn it into a library, plug the library back in, and start on a new group. The more junk you clean out, the easier it is to understand what's left.

Amanda S
+2  A: 

First thing: good luck, you're going to need it. This is potentially a HUGE job you've come upon. It sounds very familiar to me; I've worked on similar things in the past.

One thing to think about; before you start refactoring at all, I'd really strongly consider putting in place an extensive testing framework. The reason is this: with good unit tests and regression tests, you can begin making changes without worrying TOO much about breaking existing functionality. (That said, there's always a concern, but...)

That said: I'd look at slicing off distinct "vertical" slices of functionality, and see if you can write distinct unit and integration tests for them; once that is done, I'd jump in and start work on the refactor. While it may be very small at first, just the process of isolating the vertical slice of functionality and then writing integration and unit test code for it will get you a lot of experience with the existing code base. And if you manage to make that one little bit better initially, then you're ahead by that much.

After you've done that, start looking at potentially larger blocks of functionality to refactor. If it isn't possible to get clean blocks of functionality to refactor, I'd start looking at small chunks; if you can find a small (sometimes VERY small) chunk of code to then extract, unit test, and refactor, you're moving forward. This may seem like very very very slow progress at times, and it will, if you have a really large project, but you WILL be making a dent.

But in general, think of putting in place tests first to confirm expected functionality. Once those tests are in place, you can refactor with confidence (not perfect confidence, but better than nothing) that you aren't breaking things. Start small, and build on the techniques that reveal themselves out of the existing codebase. It's a long slog, but you'll get there eventually, and the codebase will be better for it.

McWafflestix
A: 

Just a few thoughts:

  • Look for common design patterns - try to see what classes are being used for core work, which ones are factories, which ones are facades or adapters.
  • Split the code into groups of classes that are dependent on or share an application state.
  • Identify which classes have persistent objects, and those that are serialized in/out of a database (which should be the easiest to isolate, provide the cleanest transactional interface, and are then portable between projects)
Joel
+1  A: 

Try to make your dependency tree as flat as possible.

One good way to do this is to use inverted dependencies, other code can be dependent upon an interface/service, but not on the provider of that service. This has helped us a lot.

GreenKiwi
+2  A: 

18,000 classes is really heading towards the "enormous" end of things. This is going to give you distinct problems, including build / compile time and having smoke come out of the computer when you fire up the ide.

My first assumption is that with that many classes, there's a lot of duplication of common functionality and possibly unused classes or possibly even subsystems. I'd expect this because when something gets that large it becomes more and more likely that a developer doesn't know the whole system, or doesn't know where those Util functions are, and finds it just easier to write a new one. Looking for redundancies to remove will help simplify.

Another possible source of redundancy is uselessly deep class hierarchies, or piles of pointless interfaces (an example - Where I work there's a directory of about 50 or so classes, most > 1000 lines (not mine, not mine!). Each of these implements an interface, which is nothing more than its own method skeleton. There are no other implementations of those interfaces. All 50 could be deleted without issue). There are also those developers who've just discovered OO and are really keen on it - you know the ones, the single concrete implementation that extends a chain of 5 abstract classes and 3 interfaces.

Along with that I'd try to take a subsection of code (a few hundred classes at the absolute most) and move them to a subproject, which I'd then link in to the main as a jar. You could then work on that in a bit of peace with a reasonable hope of being able to understand the whole thing - there's a psychological aspect to this as well - there's less of an incentive to do good work if you feel like you're working on something that's an enormous, incomprehensible mess, than if you're working on your own clean subproject that you understand completely.

Steve B.
+6  A: 

You should also take a look at Working with legacy code by Michael Feathers:

http://www.amazon.com/Working-Effectively-Legacy-Robert-Martin/dp/0131177052/ref=sr_1_1?ie=UTF8&s=books&qid=1242430219&sr=8-1

I think one of the most important things you can put in place to facilitate this are tests to ensure that everything still works post refactoring/pulling out into separate modules. Add to this by introducing a continuous integration system that runs your tests when you check something in.

Jon
The most important part is the CI system because it allows you to ensure that all projects using the code you are working with STILL builds after each change you commit.Building tests is hard, but it helps you to clarify where the new layer separations should go. If you can't write a test then you cannot call it well, from elsewhere.
Thorbjørn Ravn Andersen
Thanks for the pointer to the book, I'm going to look into that.
nojevive
A: 

My idea is that after setting up the testing infrastructure, you can write code generation tools for test cases if abstraction can be made out of common features of your testing code, maybe static code analysis tools could be add-ons besides the visualizing tools. Sorry, it's a idea. I can't even name the tools.

A: 

I am in a similar position with the code base I am working on. Very tight integration between the swing UI and the business logic. Refactoring is a delicate and time consuming project.

I would highly recommend Martin Fowler's Refactoring. It is the single most important tool I have found that has helped me improve my approach to working with a crappy code base. He outlines a logical and straightforward process to refactoring any code. It helps to read it from someone that has done this many times.

Nemi