views:

596

answers:

14

I have a big mess of code. Admittedly, I wrote it myself - a year ago. It's not well commented but it's not very complicated either, so I can understand it -- just not well enough to know where to start as far as refactoring it.

I violated every rule that I have read about over the past year. There are classes with multiple responsibilities, there are indirect accesses (I forget the term - something like foo.bar.doSomething()), and like I said it is not well commented. On top of that, it's the beginnings of a game, so the graphics is coupled with the data, or the places where I tried to decouple graphics and data, I made the data public in order for the graphics to be able to access the data it needs...

It's a huge mess! Where do I start? How would you start on something like this?

My current approach is to take variables and switch them to private and then refactor the pieces that break, but that doesn't seem to be enough. Please suggest other strategies for wading through this mess and turning it into something clean so that I can continue where I left off!


Update two days later: I have been drawing out UML-like diagrams of my classes, and catching some of the "Low Hanging Fruit" along the way. I've even found some bits of code that were the beginnings of new features, but as I'm trying to slim everything down, I've been able to delete those bits and make the project feel cleaner. I'm probably going to refactor as much as possible before rigging my test cases (but only the things that are 100% certain not to impact the functionality, of course!), so that I won't have to refactor test cases as I change functionality. (do you think I'm doing it right or would it, in your opinion, be easier for me to suck it up and write the tests first?)

Please vote for the best answer so that I can mark it fairly! Feel free to add your own answer to the bunch as well, there's still room for you! I'll give it another day or so and then probably mark the highest-voted answer as accepted.

Thanks to everyone who has responded so far!


June 25, 2010: I discovered a blog post which directly answers this question from someone who seems to have a pretty good grasp of programming: (or maybe not, if you read his article :) )

To that end, I do four things when I need to refactor code:

  1. Determine what the purpose of the code was
  2. Draw UML and action diagrams of the classes involved
  3. Shop around for the right design patterns
  4. Determine clearer names for the current classes and methods
+6  A: 

You could always start from "scratch". That doesn't mean scrap it and start from nothing, but try to rethink high-level things from the beginning, since you seem to have learned a lot since the last time you worked on it.

Start from a higher level, and as you build the scaffolding of your new and improved class structure, take all the code you can reuse, which will probably be more than you think if you're willing to read through it and make some small changes.

When you're making the changes, be sure to be strict with yourself about following all the good practices you now know, because you will really thank yourself later.

It can be surprisingly refreshing to properly re-make program to do exactly what it did before, only more "cleanly". ;)

As others have mentioned as well, unit-tests are your best friend! They help you ensure that your refactoring works, and if you're starting from "scratch", it's the perfect time to write them.

Chris Cooper
+4  A: 

You're in a much better position than many people facing this problem in that you understand what the code is supposed to do.

Taking variables out of a shared scope, as you're doing, is a great start, in that you're partitioning responsibilities. Ultimately you want each class to express a single responsibility. A few other things you might look at:

  • Easy targets for refactoring are code that's duplicated in lots of places and long methods.
  • If you're managing application state through statically initialized singletons or worse, a global state that everything is talking to, consider moving it to a managed initialization system (i.e. a dependency injection framework like spring or guice) or at least make sure that the initialization isn't entangled with the rest of the code.
  • Centralize and standardize how you're accessing outside resources, especially if you've got things like file locations or urls hardcoded.
Steve B.
That third point about standardizing file locations and URLs is definitely something I need to do. Luckily the hardcoded locations are only in a few files, but they are there and I'm not yet sure how to tackle that. Perhaps I'll open up a new question...
Ricket
+14  A: 

What was most important for me on different occasions were unit tests: I took a few days to write tests for the old code and then I was free to refactor with confidence. How exactly is a different question, but having the tests made it possible for me to make real, substantial changes to the code.

Fabian Steeg
Unfortunately, my question is indeed /how/ to refactor, tests or not. But you do bring up a great point and I do have JUnit tests on my list of things to do. For now I'm trying to just shift bits of code around so that I am 100% confident at all times that I'm not impacting the stability or functionality of the code; but before I dive into deeper restructuring I will certainly set up some test suites and hook it into a Hudson server that I already have ready and waiting.
Ricket
I hav eto agree with Fabian. You don't make it clear hwether your code is working, but if it is then automated unit tests are vital to warn you if your refactoring breaks anything. Write the test before going further with refactoring (not much point in refactoring a broken program), then run them after every change.
Mawg
+2  A: 

You might want to look at Martin Fowler's book Refactoring. This is the book that popularized the term and technique (my thought when taking his course: "I've been doing a lot of this all along, I didn't know it had a name"). A quote from the link:

Refactoring is a controlled technique for improving the design of an existing code base. Its essence is applying a series of small behavior-preserving transformations, each of which "too small to be worth doing". However the cumulative effect of each of these transformations is quite significant. By doing them in small steps you reduce the risk of introducing errors. You also avoid having the system broken while you are carrying out the restructuring - which allows you to gradually refactor a system over an extended period of time.

As others have pointed out, unit tests will allow you to refactor with confidence. And start by reducing code duplication. The book will give you lots of other insights.

Here is a catalog of refactorings.

brainjam
You mentioned the book three minutes before the higher answer so I give you a +1 for that, but I especially like that little tiny link that you hid on the very last word of your answer. It's daunting, being only a list, but I might just try to tackle it one item at a time. And I just love lists. (is that odd? :-P)
Ricket
+12  A: 

Pick yourself up a copy of Martin Fowler's Refactoring. It has some good advice on ways to break down your refactoring problem. About 75% of the book is little cookbook-style refactoring steps you can do. It also advocates automated unit tests that you can run after each step to prove your code still works.

As for a place to start, I would sit down and draw out a high-level architecture of your program. You don't have to get fancy with detailed UML models, but some basic UML is not a bad idea. You need a big picture idea of how the major pieces fit together so you can visually see where your decoupling is going to happen. Just a page or two of some basic block diagrams will help with the overwhelming feeling you have right now.

Without some sort of high level spec or design, you just risk getting lost again and ending up with another unmaintainable mess.

If you need to start from scratch, remember that you never truly start from scratch. You have some code and the knowledge you gained from your first time. But sometimes it does help to start with a blank project and pull things in as you go, rather than put out fires in a messy code base. Just remember not to completely throw out the old, use it for its good parts and pull them in as you go.

John
+4  A: 

Buy an IDE that has good refactoring support. I think IntelliJ is the best, but Eclipse has it now, too.

The unit test idea is key as well. You will want to have a suite of large, overall transactions that will give you the overall behavior of the code.

Once you have those, start creating unit tests for classes and smaller packages. Write the tests to demonstrate proper behavior, make your changes, and re-run the tests to demonstrate that you haven't broken everything.

Track code coverage as you go. You'll want to work it up to 70% or better. For the classes you change, you'll want those to be 70% or better before you make your changes.

Build up that safety net over time and you'll be able to refactor with some confidence.

duffymo
The thing I'm most struggling with is exactly what you skirted around with your answer... The "changes" that you mention several times. I'm using Eclipse and aware of its refactoring support (but thanks for mentioning it all the same!) and I am familiar with JUnit but I was seeking advice on what changes to make, or how to know what changes to make...
Ricket
That's the part that makes refactoring difficult. The changes require knowledge about the code base, the problem you're trying to solve, design, patterns, and Java. No one will be able to advise you on those unless they're sitting beside you and thinking it through.
duffymo
If you don't have an IDE that does refactoring, you might find an add-in that does it. For Visual Studio, *CodeRush* adds tons of refactorings.
Kyralessa
+2  A: 

very slowly :D

No seriously... take it one step at a time. For instance, refactor something only if it affects or helps you write the current bug/feature that you are working on right now and do no more than that. And before you refactor make darn sure that you have some kind of automated test in place that gets run on each build that will actually test what you are writing/refactoring. Even if you don't have unit tests, it is never too late to start adding them for all new and modified code that is being written. Over time, your code base will get better in small increments daily or weekly instead of worse - all without you making monumental heaps of changes.

In my personal opinion and experience, it's not worth it to just refactor a (legacy) codebase en masse for the sake of refactoring. In those cases, it's best to just start from scratch and do it right all over again (and very rarely are you afforded the opportunity to do such a thing). Hence, just refactoring incremental is the way to go.

whaley
+3  A: 

Just an additional refactoring that is more important than you think: Name things correctly!

This goes for any variable name and method name. If the name does not accurately reflect what the thing is used for, then rename it to something more accurate. This might require several iterations. If you cannot find a short, and entirely accurate name, then that item does too much and you have an excellent candidate for a code snippet that needs to be split. The names also clearly indicate where the cuts are to be made.

Also, document your stuff. Whenever the answer to WHY? is not clearly conveyed by the answer to HOW? (being the code) you will need to add some documentation. Capturing design decisions is probably the most important task as it is very hard to do in code.

Thorbjørn Ravn Andersen
A: 

Throw it away, build it new.

Chris
Good strategy in some cases, especially with smaller codebases. Not for me though. It would be much more difficult to start over at this point.
Ricket
+9  A: 

I'll second everyone's recommendations for Fowler's Refactoring, but in your specific case you may want to look at Michael Feathers' Working Effectively with Legacy Code, which is really perfect for your situation.

Feathers talks about Characterization Tests, which are unit tests not to assert known behaviour of the system but to explore and define the existing (unclear) behaviour -- in the case where you've written your own legacy code, and fixing it yourself, this may not be so important, but if your design is sloppy then it's quite possible there are parts of the code that work by 'magic' and their behaviour isn't clear, even to you -- in that case, characterization tests will help.

One great part of the book is the discussion about finding (or creating) seams in your codebase -- seams are natural 'fault lines', if you like, where you can break into the existing system to start testing it, and pulling it towards a better design. Hard to explain but well worth a read.

There's a brief paper where Feathers fleshes out some of the concepts from the book, but it really is well worth hunting down the whole thing. It's one of my favourites.

Cowan
The Feathers book is even better than *Refactoring* for this kind of thing, because it's specifically about how to make the kind of keep-it-working tiny changes you need to make to get code under test, to figure out how the code works, without breaking it in the process. It's probably the best code technique book I've read (other than perhaps *Code Complete*) since I started programming.
Kyralessa
+2  A: 

I think you should use Eclipse as a IDE because it is having many plugins and free of cost.You should now follow the MVC pattern and yes must write test cases using JUnit.Eclipse also have plugin for JUnit and it is providing code refactoring facility too so that will reduce your some work.And always remember that writing a code is not important the main thing is to write clean code.So now give comments everywhere so that not only you but any other person read the code then while reading the code he must feel that he is reading an essay.

Rupeshit
I prefer not to read an essay alongside a file of code. Keep in mind the good advice from these two blog posts: http://www.codinghorror.com/blog/2006/12/code-tells-you-how-comments-tell-you-why.html and http://www.codinghorror.com/blog/2008/07/coding-without-comments.html
Ricket
+2  A: 

Refactor the low-hanging fruit. Nibble away at the easy bits, and as you do that, the harder bits will begin to be a little easier. When there aren't any bits left to refactor, you're done.

The refactorings you'll probably find most useful are Rename Method (and even more trivial Renamings like Field, Variable, and Parameter), Extract Method, and Extract Class. For each refactoring you perform, write the necessary unit tests to make the refactoring safe, and run the entire suite of unit tests after each refactoring. It's tempting - and, let's be honest, pretty safe - to rely on the automated refactorings of your IDE, without the tests - but it's good practice and will be good to have the tests into the future as you add functionality to your project.

Carl Manaster
+2  A: 

The correct definition of messy code, is code that hard to maintain and change.
To use more mathematical definition, you can check your code by code metrics tools.
This way, you will keep the code that already good enough, and find very fast, the wrong code.
My experience say, that is very powerful way to improve the quality of your code. (if your tool can show you the result on each build or on realtime)

Avram
+2  A: 

For Java code, my favorite first step is to run Findbugs and then remove all the dead stores, un-used fields, unreachable catch blocks, unused private methods and likely bugs.

Next I run CPD to look for evidence of cut-copy-paste code.

It isn't unusual to be able to reduce the code base by 5% by doing this. It also saves you from refactoring code that is never used.

sal