views:

648

answers:

9

As you work in a legacy codebase what will have the greatest impact over time that will improve the quality of the codebase?

  • Remove unused code
  • Remove duplicated code
  • Add unit tests to improve test coverage where coverage is low
  • Create consistent formatting across files
  • Update 3rd party software
  • Reduce warnings generated by static analysis tools (i.e.Findbugs)

The codebase has been written by many developers with varying levels of expertise over many years, with a lot of areas untested and some untestable without spending a significant time on writing tests.

+4  A: 

I'd say 'remove duplicated code' pretty much means you have to pull code out and abstract it so it can be used in multiple places - this, in theory, makes bugs easier to fix because you only have to fix one piece of code, as opposed to many pieces of code, to fix a bug in it.

James Inman
+22  A: 

This is a GREAT book.

If you don't like that answer, then the best advice I can give would be:

  • First, stop making new legacy code[1]

[1]: Legacy code = code without unit tests and therefore an unknown

Changing legacy code without an automated test suite in place is dangerous and irresponsible. Without good unit test coverage, you can't possibly know what affect those changes will have. Feathers recommends a "stranglehold" approach where you isolate areas of code you need to change, write some basic tests to verify basic assumptions, make small changes backed by unit tests, and work out from there.

NOTE: I'm not saying you need to stop everything and spend weeks writing tests for everything. Quite the contrary, just test around the areas you need to test and work out from there.

Jimmy Bogard and Ray Houston did an interesting screen cast on a subject very similar to this: http://www.lostechies.com/blogs/jimmy_bogard/archive/2008/05/06/pablotv-eliminating-static-dependencies-screencast.aspx

chadmyers
+3  A: 

I can relate to this question as I currently have in my lap one of 'those' old school codebase. Its not really legacy but its certainly not followed the trend of the years.

I'll tell you the things I would love to fix in it as they bug me every day:

  • Document the input and output variables
  • Refactor the variable names so they actually mean something other and some hungarian notation prefix followed by an acronym of three letters with some obscure meaning. CammelCase is the way to go.
  • I'm scared to death of changing any code as it will affect hundreds of clients that use the software and someone WILL notice even the most obscure side effect. Any repeatable regression tests would be a blessing since there are zero now.

The rest is really peanuts. These are the main problems with a legacy codebase, they really eat up tons of time.

Caerbanog
+6  A: 

Add unit tests to improve test coverage. Having good test coverage will allow you to refactor and improve functionality without fear.

There is a good book on this written by the author of CPPUnit, Working Effectively with Legacy Code.

Adding tests to legacy code is certianly more challenging than creating them from scratch. The most useful concept I've taken away from the book is the notion of "seams", which Feathers defines as

"a place where you can alter behavior in your program without editing in that place."

Sometimes its worth refactoring to create seams that will make future testing easier (or possible in the first place.) The google testing blog has several interesting posts on the subject, mostly revolving around the process of Dependency Injection.

Gordon Wilson
Its really difficult to add tests to the code as you end up mocking a lot out before you can even get near the code you want to test! its a bit chicken and egg, can't re-factor without the tests / cant write good tests without re-factoring
Craig Angus
A: 

I'd say it largely depends on what you want to do with the legacy code...

If it will indefinitely remain in maintenance mode and it's working fine, doing nothing at all is your best bet. "If it ain't broke, don't fix it."

If it's not working fine, removing the unused code and refactoring the duplicate code will make debugging a lot easier. However, I would only make these changes on the erring code.

If you plan on version 2.0, add unit tests and clean up the code you will bring forward

Austin Salonen
+2  A: 

Good documentation. As someone who has to maintain and extend legacy code, that is the number one problem. It's difficult, if not downright dangerous to change code you don't understand. Even if you're lucky enough to be handed documented code, how sure are you that the documentation is right? That it covers all of the implicit knowledge of the original author? That it speaks to all of the "tricks" and edge cases?

Good documentation is what allows those other than the original author to understand, fix, and extend even bad code. I'll take hacked yet well-documented code that I can understand over perfect yet inscrutable code any day of the week.

Josh Segall
Unit tests ARE the documentation, the problem is that legacy code (as M. Feathers defines it in "Working Effectively with Legacy Code") doesn't have them.
Igor Popov
+6  A: 

I work with a legacy 1M LOC application written and modified by about 50 programmers.

* Remove unused code

Almost useless... just ignore it. You wont get a big Return On Investment (ROI) from that one.

* Remove duplicated code

Actually, when I fix something I always search for duplicate. If I found some I put a generic function or comment all code occurrence for duplication (sometime, the effort for putting a generic function doesn't worth it). The main idea, is that I hate doing the same action more than once. Another reason is because there's always someone (could be me) that forget to check for other occurrence...

* Add unit tests to improve test coverage where coverage is low

Automated unit tests is wonderful... but if you have a big backlog, the task itself is hard to promote unless you have stability issue. Go with the part you are working on and hope that in a few year you have decent coverage.

* Create consistent formatting across files

IMO the difference in formatting is part of the legacy. It give you an hint about who or when the code was written. This can gave you some clue about how to behave in that part of the code. Doing the job of reformatting, isn't fun and it doesn't give any value for your customer.

* Update 3rd party software

Do it only if there's new really nice feature's or the version you have is not supported by the new operating system.

* Reduce warnings generated by static analysis tools

It can worth it. Sometime warning can hide a potential bug.

Hapkido
+1  A: 

The single biggest thing that I've done to the legacy code that I have to work with is to build a real API around it. It's a 1970's style COBOL API that I've built a .NET object model around, so that all the unsafe code is in one place, all of the translation between the API's native data types and .NET data types is in one place, the primary methods return and accept DataSets, and so on.

This was immensely difficult to do right, and there are still some defects in it that I know about. It's not terrifically efficient either, with all the marshalling that goes on. But on the other hand, I can build a DataGridView that round-trips data to a 15-year-old application which persists its data in Btrieve (!) in about half an hour, and it works. When customers come to me with projects, my estimates are in days and weeks rather than months and years.

Robert Rossney
+1  A: 

As a parallel to what Josh Segall said, I would say comment the hell out of it. I've worked on several very large legacy systems that got dumped in my lap, and I found the biggest problem was keeping track of what I already learned about a particular section of code. Once I started placing notes as I go, including "To Do" notes, I stopped re-figuring out what I already figured out. Then I could focus on how those code segments flow and interact.

dj_segfault