tags:

views:

572

answers:

14

Recently I had to fix a bug that was reported from the field. While the test team was trying to reproduce the issue, the customer was breathing down our neck and we had to have production ready code in just a week's time. When at last when we were able to reproduce the issue there was just 3 days left. I and my colleague had to put in almost 30 hours of non-stop effort in finding the cause and having the fix in place in the code that wasn't written by us. Luckily we did it. However, my concern is that the test team didn't have enough of time to run through their usual test cases. And we had to overlook other trivial mistakes in the code to limit the code changes.

I would like to know from the community the best practices to follow in such time-critical conditions. Is it okay to neglect other issues (which aren't causes for the bug you are working on)? How to limit the code changes, that too in legacy code, as much as possible such that I need not worry that only minimal testing is possible. Continuous work without any sufficient breaks can also add its problems. Please share your thoughts and experience.

+10  A: 

No matter what you do, your software will contain bugs.

All you can do is your best within the time constraints your boss/company specifies.

Galwegian
Only part of the story. The rest is going back and doing the job properly, including the other problem. Fix to customer is crucial. Doing the real fix properly - without time pressure - is also crucial.
Jonathan Leffler
Link is pretty irrelevant. Bunch of people talking about their software containing bug. If it absolutely, positively has to be bugfree - costs be damned - then yes, software can be bugfree. In those cases, your industry is probably already legislated so this won't be new.
MSalters
+11  A: 

One best practice is obvious don't "Continuously work without sufficient breaks"

Another is put you commercial head on and use some common sense, what is the risk you have introduced another as serious or more serious bug? How will the customer react to that? How will the customer react if you explain your need for more time? Weigh the answers and make a commercial/executive decision.

AnthonyWJones
+2  A: 

When you are under extreme time pressure then you have to make it work. Even so, it's crucial to go over your solution to make sure that it really fixes the problem. You must understand the code involved, know how the problem happened, and make sure your fix is the correct thing. Too often patches are rushed out only to be incorrect and cause another quick patch.

As for problems encountered along the way... take note of them and move on. Be sure to come back to them, but leave them for now unless they have a bearing on the current issue.

All in all, it's an ugly situation to be in and there's no elegant solution. Just make sure that you are moving in a direction where you aren't faced with such issues.

dwc
+2  A: 

The application has been tested with your usual test cases before you started working on it. So if you only have a small time frame in which to make a specific change, that is the only change you should make. While you should test that case thoroughly, and do as many regression tests as possible, you'll probably be ok.

One of the things you may want to recommend to your boss is that having seen the legacy code, mention that you discovered other minor defects in the code, and perhaps you should run a maintenance release on the application. This way, you can go back and with more care, clean up the other issues you discovered, and have time for a full round of testing.

Elie
+2  A: 

If you notice something wrong in the source, which has never produced a problem, don't fix it without extensive testing!!

You might find that the wrong code was never called, but as well there might be something wrong somewhere else, which is 'fixing' this bug, and changing the source to do the right thing might break the application!

So if you don't have enough time for testing, don't fix stuff not related to your current problem! Note this stuff, fix it later with extensive testing.

Sam
A: 

I think that of course it is ok to ignore other (more unproblematic) bugs you might find while trying to fix the one critical one. But of course it shouldn't be forgotten about and reported in some ticket system.

I think most of the work to get a smooth result in such a case (which of course does happen) needs to be put in beforehand in having a good automatic test suite. That way you can at least ensure a little bit more that you don't introduce new bugs while fixing that one. Code reviews and the like add to that.

So when writing software maybe always think about that case when you need to react quickly and prepare for that.

MrTopf
+12  A: 

There's already some great advice here, but I'd like to add something else:

If you just crank out a bug fix under extreme time pressure, remember to come back and look at that fix when the pressure is off to make sure it isn't just a horrible hack that's a bandaid over a real problem.

Back in the late 1980s, I fixed a bug that was way down deep in a very old program. But it wasn't working right under one case that used to work. When I investigated further, I found that a "temporary" work around had been put in place. The comment said:

C TEMPORARY WORK-AROUND UNTIL I FIND THE REAL CAUSE.  I CHARNY, SUMMER STUDENT, AUG 1971

Irv Charny was my boss when I found this 15+ year old "temporary work-around".

Paul Tomblin
And make sure that the 'later' is within 48 hours--time enough to be rested after the marathon, but not long enough to forget anything. Also take notes of the other problems so you can fix them, too. Getting a fix to a customer on time is one thing. Getting a real fix into the main code is another!
Jonathan Leffler
On temporary code that lasts - I've just been removing code marked #ifdef POST_JUNE_DEVELOPMENT that has never been completed. I believe that was June 1994, maybe June 1995; the CM records aren't completely clear.
Jonathan Leffler
I think I still have a C "expiry" macro that refuses to compile past a given date (Basically convert __DATE__ into integer, subtract this from the expiry date, and use as array index). Sure, it only stops button monkeys, not developers, but developers can read your comment on what needs to be fixed.
MSalters
+3  A: 

If you feel that customer pressure is forcing you to fix a bug and deploy without sufficient testing/scrutiny, I would suggest telling the customer that the bug is fixed but not fully tested. Tell them how long it will take to fully test and give them the choice. If the bug is really as important as they've made out, they'll almost certainly go for immediate deployment - but it will be their choice and there's a better chance they'll understand what's happened if things go wrong later. If they were putting the pressure on over something that wasn't really that important, hopefully they'll allow you to test it first.

Draemon
+6  A: 

There are a number of concerns this question raises for me.

I have "been there, done that" as far as working all-nighters to try and fix problems. I can tell you for free what you probably already know - at 3am you are not thinking too clearly, and your fixes may well be causing more problems than they solve.

Not only this, but in a work culture that promotes this craziness, you are usually expected to show up the next day at 8am ready to continue giving 100%. When you are young, your body will cope with a certain amount of this, but in your mid 20's you are going to have serious side effects. Heck, even while you're young you can only go without sleep for so long. If you are driving in a sleep deprived state, it can end up costing you your life.

I hope you can make a good business case to your company's management for more sensible practices. Almost any client (no matter how pushy) can be convinced that it's better to wait a week than risk his business on software with a showstopper bug in it. The all-night coding marathon might be OK for a rare situation, but when it becomes commonplace, everyone suffers.

Bork Blatt
+1  A: 

There is nothing preventing you to continue working on making the fix as stable as possible after the initial bug-fix release has been done.

The most important is to stop the fire and make the customer happy.

Once that is done, you will need to schedule additional work to make it all work; fix "surroundings" bugs, have the QA go thru the test plans, and after that you can create another "official" release that officially fix the initial issue and with a higher level of security.

Max
+1  A: 

It really depends on the problem at hand.

I recently talked to a developer working in a pace maker company. However critical it was they just couldn't rush things. But if the need arises they had some hardware checking the software behavior and resetting the pacemaker to a 'save' state.

Then again if real money is being lost the need to fix it fast might be bigger the the need to fix it safely and complete.

Whatever you do and or fix make sure you log all your changes and go over them in slow motion to check the for potential mistakes.

Barfieldmv
+1  A: 

If you are working on a tight deadline, you need to focus. So if you see some code that shouts clean me up to you, but has nothing to do with the problem at hand, make a short note to revisit this place later but don't refactor it now. Its not only ok to do so, its mandatory.

Treb
+3  A: 

AnthonyWJones's highly-rated answer above is right on, basically

Another is put you commercial head on and use some common sense, what is the risk you have introduced another as serious or more serious bug? How will the customer react to that? How will the customer react if you explain your need for more time? Weigh the answers and make a commercial/executive decision

But what does it mean to "weigh" the answers? It means that you start assigning weights to things literally: you stop, take a break, make a list and think it out. Should you tell the customer that it's impossible? What is the risk that a small bug will be a show-stopper that you've introduced in your week-long frantic rush?

Obviously, there are no set answers, but in general, I work as fast as possible but not faster. Some customers just breathe down your neck for fun, but other bugs are so important that it doesn't matter what else gets broken fixing them. You cannot determine that without your customer's help. Remember, you are all working towards the same goal.

If the customer is too busy to talk to you, you should explain (in email, or in blood stains, whatever) that you will short circuit QA and perhaps introduce other bugs in the process. You will need to talk, briefly, about the likelihood of those being more important than the bug in question. You have experience and know what you're doing (to some extent), so you have to help the customer to understand just how mad (or not) what they're asking for is.

Anyway, after rambling a bit, here's my point: your job is to keep calm and do your job. I doubt that by working many days without breaks you actually found the bug faster: you were probably trying to go too fast. Your job is also to inform your customer as to what's possible, what is not, and what risks are incurred with each decision. But going faster than your fastest -- as in not taking breaks -- makes no sense and helps no one.

But in all cases, there is nothing to lose your Zen about. Ever.

Yar
A: 

First, I think you must separate the "emotional drama" and make a dispassionate decision about whether fixing the bug is in fact higher priority than getting the release done. Hopefully that's someone else's job. They should spare the developers the pressure of all the "client is breathing down our necks" crap. If the client is also waiting on the release, maybe it can be thrown back to them, that fixing this bug will/may delay the release

Then as Daniel said 'work as fast as possible, but not faster'. If the client is complaining, or even losing revenue, that really doesn't affect your ability to fix the bug, or fix it quickly.

As for the fix, I would do the absolute bare minimum to fix that specific bug. If possible I would write a separate block of code to handle the (hopefully) one condition that causes the fault, and leave everything else alone. The idea being to isolate that one problem and know (sort of) that nothing else is going to break due to the changes. And to easily be able to test that one condition.

MikeW