views:

354

answers:

10

I'm working with legacy java code, without any Unit-Tests. Many classes need to be refactored in order to work with the project.

Many refactorings can be done with eclipse, and am I doing some by hand. After some refactoring I review the diff to cvs-HEAD, but i can't really feel certain that everything is 100% correct.

The Question: How can I validate a refactoring, that is mathematical identical to the previous version? I wish there were a tool, but i also accept "basic human algorithms" as solutions.

I know, "run your JUnit-Tests" is the best answer, but sadly, there aren't any in my project.

Thank you!

+4  A: 

I m afraid to say that there is no such thing as an algorithm that can validate that a program is semantically identical to another program - it kinda is a halting problem, and that is proven to be unsolvable.

Another slightly more automated method would be to compare outputs from both programs. However, that could be hard for any large program since there probably isnt well defined input range...

perhaps its time you wrote the unit tests that you find so lacking?

edit: given that you'd accept human algorithms - this is what i usually do. I would study the code to be refactored, and understand its semantics. Write at least a unit test, or some sort of automated test for that part of the code base. Perform the refactor, then see if the test still passes. If it does, you have a good chance the refactor(*) didnt break anything.

(*) here, i mean refactor to be changing implementation/algorithm etc, not just simple rename and shuffling code around and making common lines of code into methods/base classes etc. Those you can almost eyeball, provided that you have a good understanding of the code base.

Chii
+2  A: 

I would still say "run your unit-tests" ;). The only option you have to be really sure that the result is the same, is a unittest. You can add them yourself before you refactor the specific code. Takes longer in the begining, but will save you a lot of time in the long run.

DarthCoder
What if your unit tests are no good? For example they don't cover some edge case? Or some other programmer has come along in the mean time and added a feature without a unit test? How do you make sure you don't break that kind of functionality either?
flamingpenguin
+8  A: 

You're going down a slippery slope if you think you can detect that by eye-balling the program. And as one of the other responders already said, the question of whether two programs are equal is undecidable (by a turing machine).

If you don't have unit tests, I suggest you at least set up a regression test harness. Take a snapshot of some input and some output version 1 of the program takes/produces, run it through version two and make sure the results are the same.

If it's a GUI, I hope it has MVC separation so you can test the model separately, otherwise you may be stuck.

xcut
+32  A: 

In "TDD By Example" there is a specific section that talks about it. The problem is that you need unit tests to refactor, but a complicated code is usually non-testable. Thus, you want to refactor to make it testable. Cycle.

Therefore the best strategy is as follows:

Do tiny refactoring steps. When the steps are small it is easier for a human to make sure the overall behavior is intact. Choose only refactorings that increase testability. This is your immediate goal. Don't think about supporting future functionality (or anything fancy like that). Just think about "how can I make it possible for a unit test to test this method/class".

As soon as a method/class becomes testable, write unit tests for it.

Repeating this process will gradually get you to a position where you have tests and thus you can refactor more aggressively. Usually, this process is shorter than one would expect.

Itay
Cool advice. +1.
Adeel Ansari
+1 Great advice. Lots and lots of small commits. Get it to a testable state, and then write unit tests and continue.
Tim Drisdelle
+1 aka Refactor Low-Hanging Fruit: http://c2.com/cgi/wiki?RefactorLowHangingFruit
Carl Manaster
+1 Every coder should read TDD by Example ! http://www.codelord.net/2010/01/12/every-coder-should-read-tdd-by-example/
abyx
+1 Precisely how I look at the situation.
Grant Palin
+5  A: 

The Question: How can I validate a refactoring, that is mathematical identical to the previous version? I wish there were a tool, but i also accept "basic human algorithms" as solutions.

Strictly speaking, refactorings are a set of well-known transformation which have been proved to preservation the semantics of the code. See Refactoring Object-Oriented Frameworks. Everything else should be called re-engineering, but I agree that both are used interchangeably.

Reasoning about semantics preservation is hard, and is an open research topic. See for instance Formalising Behaviour Preserving Program Transformations.

Until we have efficient tool to check the semantic preservation, the best in indeed to rely on testing. Another approach to get more confidence in your change would be to add assertions and contracts. It will force you to do a review and think about what could have changed, what are the invariants, what could be broken more in depth.

ewernli
+3  A: 

The Question: How can I validate a refactoring, that is mathematical identical to the previous version? I wish there were a tool, but i also accept "basic human algorithms" as solutions.

I agree with Chii that this is intrinsically impossible. If you really need to refactor it (instead of writing an adapter for the legacy code, making your new stuff loosely coupled to the old code), you have to take especially care of subclasses, overridden methods etc. Writing unit tests may help, but if you don't actually know what the code should do, how can you write unit tests for it? You can only write unit tests to assure that the new code does what you assumed that the old code did.

bertolami
+1  A: 

I have done the following to a project with some huge god classes with a need of refactoring:

Use AOP to 'dump' your object state and parameters at the start of your method and at the end. Dump the return value too.

Then record a lot of scenarios with the code you need to change and replay them using your refactored code.

This is not mathematical, it's a bit heavy to set up, but after you get a good set of non-regression tests.

The dumper is easy to set-up with a tool like XStream.

Guillaume
+1  A: 

I'd go with Itay's answer, but just giving another option.
If you're willing to pay for it, there's a product from Agitar that automatically generate JUnit tests for existing code. Those tests are "characterization" tests, that are intended to test the code does what it currently does. Then, once you make changes, you can see that what breaks is what you wanted to change only.

More information is available here.

abyx
+1  A: 

In theory, you could define a set of safe transformations (refactorings), and make a program that checks that program B is the result of applying a finite subset of those refactorings to program A. (Avoid the halting problem by setting an upper limit). But I'm afraid that making such a program is much more difficult than writing the unit tests for program A, which is what you should really do.

ammoQ
+1  A: 

I am a bit surprised that so far no one has mentioned the book Working Effectively with Legacy Code, by Michael Feathers. It deals with this exact situation, with lots of practical advice, in various languages. I recommend it to anyone dealing with legacy projects.

Péter Török