tags:

views:

181

answers:

8

Let's say you got a project that is really poorly written, contains lots of code smells, wtfs, etc. Moreover, its code structure is so complicated that it is extremely hard to add any new functionality to it. On the other hand, the project works as it is supposed to.

You want to refactor the project, perhaps move it to a new framework, how would you approach this problem? Will you try to build a new one from scratch or use some techniques (specify) to convert the working project into a new one?


I would like to clarify the question a little bit as there is a confusion what do I mean when saying "refactoring".

I'll give an example about a car, think of it as a software project. Let's say you have built your own car. Its construction is pretty weird: engine is upside down, thus all the pipes are laid differently, electricity wires are tangled together and nobody has an idea where do they start or end, etc.

However, everything is working fine: you can easily ride it to shop, work, etc. However, its fuel consumption is a bit too high. Also, if you ever wanted to install new headlights to it, it would be a disaster with all the mess in the wires.

You cant afford buying a new one, so you have to refactor the car somehow: change engines position to normal, make the wires tidy, etc. You need to do this, because sooner or later you will need to change engine, headlights, install new stereo etc.. On the other hand, you still need something to drive you to work every morning, so you must make sure that you don't screw everything up.

Now let's go back to the project. How would you refactor the project as complicated as the car above, while not disturbing its primary function and purpose.


I would also like to make this a community wiki. Please edit.

So far the main tendencies are:

Links:

+4  A: 

First, create a suite of automated unit tests, and confirm you have high code coverage (70% or more). During refactoring, you will run those tests very frequently, in order to convince yourself (and Management) that you haven't broken anything.

No unit tests = no refactoring.


Let me change my mind a little. You don't need unit tests - you need high code coverage from tests that will be run frequently, as you change the code. These could be automated unit tests, or automated functional tests.

I do not believe that manual tests are an adequate substitute. They are much less likely to be run frequently during the refactoring process. Refactoring is much less likely to succeed without the constant reassurance that the code has not been broken in the process of fixing it.

John Saunders
@Downvoter: "-2" is meaningless unless you say why. If you want your vote to matter at all, then speak up.
John Saunders
-1 If you think that creating automated tests for a working project is the first step, you can't have ever refactored a project of any significant size. Unit tests are not the first thing you should do with a system of size. Read Michael Feathers' "Working Effectively with Legacy Code".
Mike Burton
Ummm...Wait for the comment before you raise the complaint?
Mike Burton
@John - I agree with the ideal. But you seem to be denying the possibility that the business logic and user interface are too highly coupled for unit tests in the code's current state.
Andrew Shepherd
@Mike: I comment first, to avoid the problem of being identified with the vast majority of downvoters, who do not comment.
John Saunders
@Andrew: if they're too tightly coupled, then you cannot refactor. You can maintain the code, but don't call it "refactoring" unless you have unit tests to prove that your changes have no effect. Otherwise, we'll teach Management to say "no" to "refactoring", as "refactoring" will include "breaking the code in a good cause".
John Saunders
@John - When you say code is in a state that "you cannot refactor", does this mean it can only be abandoned? That it can never be fixed up?
Andrew Shepherd
@Andrew: to me, the term "refactoring" means making a number of small changes which have no effect on the functionality of the code, but which may improve the code in terms of quality, etc. Any change which may break the code cannot, in my opinion, be called "refactoring". If you have inadequate unit tests, you cannot know whether you're breaking the code. Thus, you're not refactoring. Feel free to change the code, but don't use the term "refactoring" and associate that term with the number of bugs you're introducing.
John Saunders
@John: you've fixated on the word refactoring, but that's not really what the question was asking.
Mike Burton
The question uses the term "refactoring", which has a fairly specific meaning. If the OP really means "arbitrarily changing code to improve readability", then he/she should say so.
John Saunders
Or, as an answerer, you should pay attention and correct him rather than assuming the usage is correct when it's obviously not.
Mike Burton
I don't know what you mean. The OP may very well have been thinking of actual refactoring, but without knowing of the dependency on unit tests.
John Saunders
The definition of "refactoring" is to change the code for the purpose of improving its quality without changing it's outward behaviour. A change can be a "refactor" without unit tests. The absence of unit tests does not automatically mean the behaviour will erroneously change. The presence of unit tests does not guarantee the absence of introduced defects. The definition of refactoring exists independentally of unit tests.
Andrew Shepherd
We disagree, that's all. I don't believe refactoring is possible or practical without the ability to pretty near "prove" that the change does not break the code. If you can do that without unit tests, then go ahead. Otherwise, I hope you'll choose a term other than "refactoring" to mean "trying to improve the quality of the code without being able to determine whether you've broken it".
John Saunders
Agreed, we disagree. I'm good now :-)
Andrew Shepherd
+3  A: 

This will give one perspective on rewriting from scratch:

http://www.joelonsoftware.com/articles/fog0000000069.html

Andy West
Joel's point is a business one: you'll lose market share by rewriting. That argument doesn't apply to large swathes of software in the wild.
Mike Burton
Aren't most large swathes of software in the wild from businesses? What I got out of the article is that rewriting software from scratch is risky. Writing bad internal software may not cause you to lose market share, but you could lose your job.
Andy West
@Andy: You could; more typically the risk of losing one's job comes from implementing functionality too slowly so someone goes out and buys a new package. Rewriting an internal application can often be much faster than implementing a major new feature in a painfully bad design.
Mike Burton
@Mike: I actually agree. As a developer, the first thing I want to do when I see nasty code is scrap the whole thing and rewrite it. But I reread that article occasionally to help me restrain myself, especially when I know a rewrite isn't fully justified.
Andy West
+7  A: 

You should read Working Effectively with Legacy Code.

Luke Girvin
Wow, now I want that!
Andy West
This is an excellent book for reference purposes on reworking an existing system.
Mike Burton
+5  A: 

Working is the only kind of project you should be refactoring. If you're fixing bugs, you're changing the behavior, and refactoring is explicitly about not changing behavior. But working has different definitions. The good one - the useful one for refactoring - is well-unit-tested. If you've got good test coverage (automated tests!), you're ready to refactor. If you don't...

Read Michael Feathers' Working Effectively with Legacy Code. Nibble away at the code. Pick a WTF that particularly offends you, and test the hell out of it, with automated unit tests. Then replace the WTF with something reasonable, ensuring that the tests continue to pass. Lather, rinse, repeat.

Refactor the low-hanging fruit.

Carl Manaster
+1 for the distinction. While Michael's book is extremely useful, however, I would note that he never acknowledges that it is possible, useful,occasionally necessary, and even sometimes perfectly safe to break out the sledgehammer and radically restructure an application uncautiously.
Mike Burton
A: 

Unit tests are a good thing. However, the code has to be in a certain state before you can even unit test it. I bet you have data validation that talks to the persistence layer inside the message-handling functions. I'm sure you have thousands of lines of business logic that interrupts the user with confirmation message boxes.

You have to separate the business logic from the user interface layer, and that can be a lot of work.

Now you've got a classic Catch-22 situation - you're not supposed to refactor it without unit test coverage, and you can't write unit tests until it's refactored.

So the only solution is to be very patient. Understand exactly how the application is supposed to work. Try and understand what each piece of messy code is doing. Accept the fact that sometimes you will not be able to make sense of some code, and the reason for this is that the code actually is pointless. But the pointless code will look exactly the same as the fundamentally useful code, and only extreme hard work will give you the insight to tell the difference.

Work towards the goal of having unit test coverage - but the lack of unit tests should not stop you starting. Accept the fact that you may never get the unit test coverage you want, but keep the ideal in mind.

And every day you will say to yourself "This would be so much faster if I just rewrote the whole thing". DO NOT ACT ON THIS FEELING. (See Joel's article, already mentioned).

Andrew Shepherd
-1: I strongly disagree. The code should not be touched until unit tests exist to prove that touching it isn't breaking something. Otherwise, touching it, even with the best intentions, _will_ break something, and cost you the goodwill of Management.
John Saunders
+1 I've stated the case in more detail, but this is what I'm talking about. Unit tests are worthless if you don't actually have units with expected behaviour in any meaningful sense.
Mike Burton
+1  A: 

There's a lot of text in response to this question that follows the literature very closely, and (with no offense to the posters of those answers) it makes me wonder if people simply haven't reworked a system of any significant scale that is really and truly Fucked Up.

I've rewritten three significant applications in my career (Significant: 50kish LOC or larger, data tier, logic tier, integration requirements). Every system has required me to say To Hell With This at some point to good practice. You know what I learned from doing that? At a certain point, it can be really, really safe. Absolutely, you need to consider what it means to throw caution to the wind, but it is significantly more important that you ship than that you follow someone else's idea of good practice.

Let me illustrate with an example of what I'm talking about:

I'm working on a system today that was written over six years that was originally converted to a .NET language from an application that was at the time perhaps a decade old, and was written with a DOS client that held all of the logic. The original code and most of the subsequent updates and conversions were handled by inexperienced programmers. This is a document management engine for all intents and purposes, and there isn't a single abstraction for "document" anywhere in the code.

I was asked to implement a means to transfer files across a WAN such that it would function with the core routines of the system. I started out creating a nice little test client and server with decent practices wrapped around them, tests etc. I went through the core system architecture and looked for the right place to re-stub my code. All I found was big nasty chunk after big nasty chunk of copy and pasted code with tiny modifications that constituted units.

I started changing little bits of code, things started breaking, I reset my changes. I extracted methods and found that variables were scattered throughout the code and relied on changes made throughout the procedure. I tried extracting classes, only to find that I was extracting methods wholesale and that the state data from the old class was, once again, scattered throughout the code, and not amenable to transition.

So I said THWI. Two weeks later we had a nice little zipping client and server and our core code was much more nicely factored for our trouble.

If you know a system, if you pay attention, if you test your code, and if you PAY ATTENTION, there's often nothing wrong with making big changes in an unsafe way.

I'm gonna get downvotes for this, but Agile practices have clouded people's vision too much, and it's really time for quotes from the literature to stop dominating these discussions. If you work with a system every day, you know the system, and no amount of unit testing is going to equal what you can make happen if you take charge and fix the damn system.

Of course, this is why you need to work with a system and learn the system instead of just going from technology to technology. Rewriting to a new language is not nearly as good an answer as rebuilding with an eye to improving the design. If you then find there are limitations in the system that you could overcome with a new technology, it becomes substantially simpler to make that change at that point.

Mike Burton
@Mike: congratulations, but what you are doing is more "reworking" or "rewriting", and less "refactoring". Are you even able to measure the number of bugs you're introducing into this system? BTW, my responses have nothing to do with "agile". Unit tests are the only thing proving that your well-intentioned changed to bad code aren't actually making things worse.
John Saunders
+1 from the "pragmatism vs idealism" faction. This is turning into an interesting debate.
Andrew Shepherd
Note that I'm not saying the code cannot be changed. I'm saying that such changes are not refactoring. They're the same thing we used to do back in the day before Fowler's book popularized the term. In fact, it's possible to change code without unit tests, if your functional test suite is good enough. But how often are you going to run your functional tests? How many changes will have been made before you realize you broke something weeks ago?
John Saunders
@John: Unit testing + refactoring are very much strongholds of agile practitioners. If you're not aware of that, well, I'm not sure what to say about that. The OP, of course, had nothing to do with refactoring specifically, despite the use of the word in the text. "Moving to a new framework" is the essential clue to that fact - you don't refactor to a new framework, that's a fundamental system shift and has no business being considered at the outset. Fowler's usage of "refactoring" doesn't get preserved most of the time; it's something he's picky about precisely because of that effect.
Mike Burton
By the by, unit tests don't verify functionality. Unit tests can be just as much a blind alley as ill-informed restructuring. Unit tests are worse than useless when you're starting with garbage in my experience.
Mike Burton
@Mike: I know they're used by "agile" people, but "unit tests + refactoring" does not imply "agile". "agile" may imply "unit tests + refactoring", but not the other way around.
John Saunders
@John: What precisely do you think of as the difference? Are you doing some other process style and using unit tests and refactoring? I'm hard-pressed to identify a scenario where that makes any sense.
Mike Burton
A: 

I'd opt for refactoring rather than rewriting wherever possible, because it's less risky. If done right, you'll always have a working program, and it'll gradually get better as you make incremental changes to it. If you have to stop early (due to time or budget constraints), you still get the benefit of the work that's been done so far, because the program is still usable and it's better than it was before, even if it's not perfect yet.

On the other hand, if you opt for a rewrite and have to stop partway through, you're left with the original working-but-bad program and the new clean-but-incomplete program. You'll have to decide whether to scrap the new version and let the effort go to waste, or sacrifice functionality and gain bugs by switching to it before it's ready.

Wyzard
A: 

First, I agree with the other posters - do not give in to the temptation to rewrite from scratch. The code does work, after all. Second, and it may sound obvious, but concentrate your refactoring on those areas that need modification (for new features, etc.). There may be code that's crying out to be refactored, but if it's working and in a stable part of the app, leave it alone.

Working Effectively with Legacy Code does a great job of addressing the catch-22 that you don't want to refactor without unit tests, but you can't introduce unit tests without refactoring. Among other things, he suggests relying on automated refactoring tools to avoid introducing bugs when refactoring without tests.

Others may disagree with me, but IMO sometimes you you have no alternative but to do the initial refactoring without unit tests, and rely on automated or (careful) manual integration tests to make sure you got it right. The initial refactoring should allow the code to be run in a unit test harness, than you're on your way.

glaxaco