views:

267

answers:

10

Hi All,

I've been tasked with maintaining and refactoring a legacy Java system. I currently do C# and .NET, although I'm familiar with Java.

The legacy system uses RMI, a client/server architecture and was designed for the 1.4 JVM. It's using for the UI (as far as I can tell), Swing and AWT.

My question is this: What is the best way to come to terms with the codebase that I've just been handed? I'm thinking flowcharts of screens, defining boundaries between the RMI calls, and writing unit tests (for the bits that are testable).

What do you do in the situation when you're handed an unfamiliar codebase?

Thanks!

-Jarrod

+4  A: 

The first thing I do with any new code I'm handed is look at the existing unit tests. Writing new tests is usually the second thing.

Bill the Lizard
Not sure if that is intended but I find it funny that the next thing is usually writing new tests (does'nt say much about the existing unit tests , does it ? :)
Learning
I don't mean to disparage my colleagues, but I've yet to see a 100% complete set of unit tests (even my own can be improved). :) Also, I find that writing tests is just a better way for me to learn than reading them.
Bill the Lizard
ROTFL, good one :)))))))
01
A: 

Java 1.4 with RMI isn't legacy, that's practically new by some standards!

Do whatever you can do to familiarize yourself with where things are -- unit tests, tracing code/calls, working up some UML diagrams, etc.

Try starting with a smaller section of it and tracing code paths to get familiar with where things are or assign yourself some small fixes/improvements that require looking through the code.

Andrew Coleson
Java 1.4 is pretty old by Java standards (7 years since release). Sun declared "End of Service Life" (EOSL) for 1.4 in October after 2 years in end-of-life transition. Java 5 reaches EOSL later this year (http://java.sun.com/products/archive/eol.policy.html).
Dan Dyer
In any event, the term legacy here simply meant that his company was no longer using the system or no longer employed the original developers, so the code is "old" regardless of the community's feelings on Java 1.4
Karl
Right, "practically new" was mostly tongue-in-cheek, but Java 5 nearing its EOL doesn't mean that 1.4 isn't still common. Regardless, Karl's right on the real meaning of the term "legacy."
Andrew Coleson
+4  A: 

It very much depends on the quality of the codebase. Actually, you could define it even more narrowly - it depends on how clear the code is, and how well commented it is (and I write this as someone whose repeatedly been in the position of inheriting poorly documented legacy codebases).

The worse the codebase, the more you're going to need to infer its functioning from the outside in - if you can't figure out what a function is doing , you may at least be able to figure out what the GUI seems to imply that it's doing. Having an RMI interface may be a blessing in that it gives you a simplified picture of what the GUI needs to see - abstracting from the remote interface is a very good idea.

If the codebase is really poor unit tests aren't likely to help you, as you may be testing something that is wrong even if properly implemented. An example from the last thing I inherited: (names and descriptions removed intentionally - they were not helpful in the original in any case)

 public static int[] a (List<int[]> input){  

   ... lots of opaque rubbish
   return b(input);
 }

 public static List<int[]> b{ List<int[]> input)
 {
   ... more crap....
   return some horribly mangled list of int[];
 }

As it turned out, what a accomplished was to sort the arrays by the value of their second element. Testing b to behave properly would be useless in this case.

Suggestions to help you retain your sanity:

  • remove code that's clearly not in use.
  • Try to factor out any "magic numbers"
  • find any hardcode file, path, or resource references and factor them out into properties, or any other central, generic way to deal with them.
  • try to ascertain whether or not the app is really performing the way it's supposed to - it's not unusual for some really complicated function that you can't make heads or tail of actually represent something that the users felt "never really worked right".
  • Find any original documentation or specs from when the thing was first designed.
  • Since you mention 1.4, genericizing maps can clarify what kinds of objects are getting passed. if you have an object backed by a HashMap, and how it's getting populated is a mystery, it can really simplify your life to determine that it's actually a Map of Strings to Integers - that's often a lot easier to figure out than what it's actually supposed to be doing.

Of course most of this is unecessary if the codebase is well written, but then your job will be much easier in any case. And good luck.

Steve B.
+1  A: 

The first thing I do when receiving new code is "try to make it work"! By this I mean:

  • first install it (as per the install notes if any)

  • then familiarize myself with it (by reading the user manual and running some important use cases)

  • then some reverse engineering to find out the main layers. classes and dependencies

  • then I can think about writing new test cases (at acceptance level first, that will be useful for regression testing later on)

  • then I give myself some "challenges" about adding this feature (even if it is not required) or improving the performance of that existing feature: that's a good way to dig into the existing codebase

  • of course, if there are some known pending bugs/RFE, I'll work on these first

In your specific case, I would also try to document the RMI calls, what call or sequence of calls to do what, and relate it to use-level features when possible.

In addition (and that should come first actually), one important thing to know is the main objectives of this system (why does your customer have this system for?). Knowing the goals and keeping them in mind will avoid you parting away from these goals while maintaining the code.

jfpoilpret
A: 

Build and run it would be my first thing. Start getting a sense for the problem it was trying to address.

Maybe you can import it into a UML tool like JUDE and get a picture of how the classes interact.

Writing JUnit tests is an excellent suggestion. I'd want to see how layered the app was. If it's hard to test, perhaps it's too coupled. Unit tests will tell you that. They'll also give you a safety net if you decide some refactoring is in order.

JDK 1.4? It's past the end of its support life. I'd also want to see if the code would build and run under JDK 5 at a minimum, preferably JDK 6. Maybe you could slap a few of those JUnit tests into JMeter and do a quick poor man's load test with 5 simultaneous users.

If you've got a data model, pull that into ERWin and start seeing how tables, objects, and screens flow together.

duffymo
+2  A: 

One thing that helps me work with code that is new to me -- this is much less necessary for well-written code -- is to massively refactor it for a day or two and then to discard all of my changes. This process helps me understand what the code does; working with code helps me understand it. It also begins to teach me what parts of the code are fragile.

If you have the opportunity of migrating to a newer release of Java, then genericizing all collections will help understand what data types are passed around.

Of course, I do this after installing the software in a test lab and playing with it a little bit to understand what it does.

Edit: Thinking on my answer, also useful is to enable all diagnostic tracing and logging, to use the system, and then to study the logs. If communication protocol tracing exists, then looking at this trace will help understand the communication protocol used by the code, perhaps with a Wireshark trace of the same testing.

Another useful migration is to migrate from the old Concurrency library to the new Java 5 (and 6) concurrency library. This will help you understand where the threads are and when they are started and when they will shut down.

Of course, with any code changes on an unfamiliar code base, I assume that proper testing is done to ensure nothing is broken! However, to balance this, I've learned that after refactoring badly-written code, the bugs newly introduced are often far easier to find than the bugs that existed prior to the refactoring.

Eddie
A: 

Of course you could try step through. Its slow but a day spent walking through code might save a week of bug searching later...

You could also try the scavenger hunt approach, make a list of every feature and go hunt the code for it...

A: 

First skim through the code to understand its structure. Then, run the app in debug mode and run through it a couple of times. This will show you the flow and the structure.

Use Netbeans to reverse engineer class diagrams.

Joshua
+1  A: 

There was a talk I attended by Mike Hill in which he talked about a process he uses for dealing with bad legacy code. He called it "microtesting" and it is basically isolating each thing you want to test (in its own function or small object), and writing tests around it. These tests do not mean to assert things in the business sense - they are just meant to capture the state the system is left in after the lines under test are executed. Typically, he would assert that variables had the values they had in the debugger. Naturally, these tests would pass initially, but after he had written enough of them he would effectively have a snapshot of the system.

Once those tests were in place, he could start refactoring that bit of code to try to make sense of what it was trying to do. He could do so safely because his "microtests" would fail if he changed the way the function (or object) behaved even slightly. While that meant he could not make big design changes, he could eliminate a lot of noise and he could get a picture as to what the system was doing. Once he had a clear idea as to what the code was achieving, he could start improving it, finding bugs, etc.

There aren't many free resources on this, which is why I'm not providing links. Hopefully I've managed to describe enough to give you some ideas.

Lyudmil
+1  A: 

Experience has shown me that there are 3 major goals you have when learning a legacy system.

  • Learn what the code is supposed to do
  • Learn how it does them
  • (crucially) Learn why it does them the way it does

All three of those parts are very important, and there's a few tricks to help you get started.

First, resist the temptation to just ctrl-click (or whatever your IDE uses) your way around the code to understand everything. You probably won't be able to keep everything in perspective in your mind this way, especially when each line forces you to look at multiple other classes in order to understand what it is.

Read documentation where possible; it usually helps you quickly gain a mental framework upon which to build everything that follows.

Run test cases where possible.

Don't be afraid to ask someone who knows if you have a question. Granted, you shouldn't waste other employees' time with inane queries, but if there's something that you simply don't understand (this is especially true with more conceptual questions like, "Wouldn't it make much more sense to implement this as a ___" or something), it's probably worth finding out the answer before you mess something up and don't know why.

When you do finally get down to reading the code, start at a logical "main" place and go from there. Don't just read the code top to bottom, or in alphabetical order, or anything (this is probably obvious).

mandaleeka