views:

328

answers:

6

The IT department I work in as a programmer revolves around a 30+ year old code base (Fortran and C). The code is in a poor condition partially as a result of 30+ years of ad-hoc poorly thought out changes but I also suspect a lot of it has to do with the capabilities of the programmers who made the changes (and who incidentally are still around).

The business that depends on the software operates 363 days a year and 20 hours a day. Unfortunately there are numerous outages. This is the first place I have worked where there are developers on call to apply operational code fixes to production systems. When I was first, there was actually a copy of the source code and development tools on the production servers so that on the fly changes could be applied; thankfully that practice has now been stopped.

I have hinted a couple of times to management that the costs of the downtime, having developers on call, extra operational staff, unsatisifed customers etc. are costing the business a lot more in the medium, and possibly even short term, than it would to launch a whole hearted effort to re-write/refactor/replace the whole thing (the code base is about 300k lines).

Ideally they'd be some external consultancy that could come in and run the rule over the quality of the code and the costs involved to keep it running vs rewrite/refactor/replace it. The question I have is how should a business go about doing that kind of cost analysis on software AND be able to have confidence in that analysis? The first IT consultants down the street may claim to be able to do the analysis but how could management be made to feel comfortable with it over what they are being told by internal staff?

+6  A: 

We recently decided to completely rewrite large portions of our business code from scratch, and it has not gone as well as we had hoped. I've seen a lot of quotes saying you should never try to rewrite anything from scratch, and now I see why. I would recommend starting small - don't try to rewrite the whole thing at once. Identify the large problem areas and focus on refactoring small portions of the system at a time. Since there is 30+ years worth of work in the system, it will take a long time to get it back to a reasonable state. We had about 5-8 years worth of work to rewrite, and it has been difficult. I can't imagine 30+ years of work!

Andy White
+1  A: 

I think that your description provides all of the necessary information on code quality (lack thereof). The fact that so many support resources are required also indicates the high costs involved with maintaining the existing system.

As I answered here, a good approach to consider is refactoring one piece of the system at a time until everything works at an acceptable level. I agree with Joel re not throwing away existing code (see Things You Should Never Do. Parts of your code work, so you should leave those in place whenever possible, and focus on the sections that lead to downtime.

Andy also makes a great point about starting small as well.

Another thing to try, is reviewing the processes around the system. When you do this, you should try to determine what failure situations are caused directly or indirectly by user action?, are there configuration or environment problems? If you are having trouble fixing the code directly, then you can still prop it up by dealing with external issues more effectively.

Dana the Sane
+4  A: 

The first thing that comes to mind is that you are prematurely addressing the rewrite/refactor/replace argument. The first step two steps I would recommend would be:

  1. Unit tests
  2. QA

It's well within engineering scope to implement these. Unit tests are an essential preliminary step before any reasonable refactor or rewrite could possibly take place. By 'unit test' I mean wrap each function call with corresponding code that proves the code works for all known conditions. In complex retrofits this may not actually happen at the most granular level but any automated tests will help immensely.

And QA - have an independent (and aggressive) quality assurance team that rigorously tests beta releases before production. Their test plans and test procedures become essential for any kind of replacement effort.

Once you've got the code under control, then you are in a position where the business can reasonably consider massive changes.

Just a note about your comment about external consultants - no consultancy will ever care enough about the code to provide realistic quality assurance. QA ends up being married to the hip of business defending the company bottom line. It's an internal function ultimately and an external consultant can't provide much more than getting you started really.

John Fricker
The biz has recognised the QA requirement. The QA team is now bigger than the dev team! I know what the response would be if I suggested unit tests to the 73 yr old head programmer but I agree you are right, I swear by unit tests.
sipwiz
Even an old guy can see the value. I worked on a similar project, 80k lines of 8 year old code running a vary large ecommerce site. That project sold me on unit tests pretty easily.
John Fricker
+6  A: 

First, the profile of the consultant you need is very specific. Unless you can find someone who worked in a similar domain with the same languages, don't hire him.

Second, there's a 99% probability (I like dramatic numbers) the analysis will go as follow:

  • Consultant explores the application
  • Consultant does understand 10% of the application
  • Time's up, time for the report
  • Consultant advices a complete rewrite (no refactoring, plain rewrite)

So you may as well make the economy of what the consultant will cost.

You have only two solutions here:

  • Keep with the actual source code but determine proper methods to fix problems so that you have a very long run refactoring that is progressly made by those who know the application
  • Get a secondary team to make a new application to replace the old one

If I talk about a secondary team, it's because you cannot bring just one architect to make the new application and have the old team working with him:

  • They're too busy on the old application
  • There will be frictions because the newcomer will undoubtedly underestimate the task at hand

I talk from experience, believe me.

If you go the "new application" way don't put your hopes too high. You'll end up with an application that has less than half the functionalities of the current one, simply because you cannot cram 30+ years of special case and exceptional situation fixes into a freshly design software.

Oh, also, if your developers happen to tell you they have a plan, by all means, hear them out. They most probably know what they are talking about.

Julian Aubourg
+1  A: 

The code has been around for 30 years?

Development paradigms have shifted substantially in the last three decades in many ways, and most relevant to your predicament, I feel, is in terms of the amount of time (in man days) required to create something to input->process->output something.

300,000 lines of code 30 years ago, could probably fit into 100,000 lines or less today, and expending fewer man hours(?) This could seem optimistic/ridiculous to some, but on the other hand is achievable, depending on the type of application in question. You have given no indication as to the classification of system - is it a real-time manufacturing process control system of sorts with sensors and actuators tied to it? An airline booking system ? Does it post-process some backlog of data? In other words could it be rebuilt in something like Java and quickly with an agressive, smallish team? Have the requirements been documented, and if so do they need updating or redeveloping from scratch? Is human safety a factor?

Just a quick sanity check, I think whether or not you should rebuild depends on (any order means the same thing):

  1. Number of code dudes required.
  2. Level of expertise of said dudes.
  3. Which languages do not fit.
  4. Which languages do fit.
  5. How much it costs to use chosen language(s) them in terms of hardware and software.
  6. How much does the business depend on this to stay alive.
  7. Is it really too much downtime, or are you just nitpicking? (maybe they really don't care, but pretend to).

Good luck with that!

karim79
+1  A: 
  1. Read the book Working Effectively with Legacy Code (also see the short PDF version) and surround the code with automated tests, as instructed in that book.

  2. Refactor the system little by little. If you rewrite some parts of the code, do it a small subsystem at a time. Don't try to make a Grand Redesign.

Esko Luontola

related questions