views:

751

answers:

21

From time to time we get bugs on production that can be fixed by for example by changing a configuration, disabling some part of the logic, and such.

I've argued with my manager that we should reproduce the bugs locally to ensure the fix works, and more importantly so developers and QA can include the check for these cases as part of the regular release.

My manager thinks is a waste of time, as the solution works so there is no need to reproduce locally.

So: Should we try to reproduce locally to verify the fixes ? Any pointers on how to sell this point to my manager if you agree with me ?

+10  A: 

I'm a big believer in reproducing bugs if it is economically feasible (e.g., not copying some user's entire hard drive to your local machine just to reproduce the environment).

The unfortunate nature of bugs is that there is often only one way to fix a bug, but many ways to mask it, and many fixes actually mask rather than fix. You may never find the root cause, so the bug would appear again, or there would be another seemingly different bug later.

If you can find an example where that has happened (e.g., two unrelated bugs from same root cause), you may be able to win over your boss.

Uri
+1  A: 

The most important reason to having to reproduce a bug locally is to make sure that you didn't screw anything else while fixing it. If the fix is just a configuration thing or anything that doesn't require a recompilation, this argument is less powerfull since anything that was compiled should have already been tested.

If your manager doesn't get this point than there isn't much you can do to convince him other than show him real cases where fixing a bug actually screwed something and that causes even further delays than what would have taken to reproduce the bug locally.

shoosh
+16  A: 

Off the top of my head:

  • You could inadvertently introduce new bugs with the fixes
  • You can't be 100% sure that the fix actually fixes the bug without testing (except maybe in the very simplest cases)
David Zaslavsky
+3  A: 

Test Automation is what makes all your tests re-playable at any time with little to no extra effort. Unit testing is a very common kind of test automation that involves testing each of the individual smallest parts.

Tim Matthews
Unit testing only helps if the problem exhibits itself at the unit level.
Steve Rowe
True then maybe create an extra unit that makes use of the other units you are concerned about.
Tim Matthews
This doesn't answer his question.
Rob
His manager thinks it's a waste of time. Unit tests are supposed to be completly automated write once so if a change causes new bug or exisiting to come back then you know. The automation of it is what he needs to argue to his manager to show is not a waste of time.
Tim Matthews
I think the background is that they did have some automation in the past but developers complained that it was too much of a burden !!!!!, I suspect it was more about the process than the automation itself (I know is just kind of crazy...)
webclimber
+4  A: 

The basic principle behind test-driven development is that you have a test which demonstrates some feature.

In this case, you create a test that demonstrate the bug. The test fails.

You fix the bug. The test passes. As do all other tests. You've actually fixed the bug without creating other problems.

S.Lott
How do you know your test reproduces the bug without having seen the bug in the first place?
Steve Rowe
@Steve Rowe: I don't get your use case at all. A bug is reported. You realize the tests are incomplete, so you fix the test cases. What -- specifically -- are you talking about?
S.Lott
+2  A: 

I think the answer to this depends on how you test for your bugs.

  • Do you use an automated testing tool (like the xUnit frameworks, Fit/Fitnesse, etc.)?
  • Do you use an automated build process?
  • Does this automated build process call the tests and record the results?

Unless the answer to the above questions is yes across the board then incorporating checks to verify bug fixes have occurred will be tedious and painful. It will also be hard to convince the manager to apply time to incorporate these checks because they can always say "well we don't have enough people resources for this"

If you invest some time into an automated solution then it will be a no-brainer to add these tests into the build process as the first step to resolving any bug should be to create a test that reproduces the problem locally anyway.

For my project, we use subversion for source control, Jira for activity management, Hudson for builds, Fitnesse and MSTest for testing. All of these are tied together using continuous integration. Fitnesse does the acceptance tests and MSTest the unit testing and they run automatically every time a build is performed to give us a quantitative indicator of how "good" the build is.

Jeffrey Cameron
+33  A: 

If you haven't reproduced the bug, your "fix" is nothing better than a guess.

Limbic System
Put another way; if you don't test your fix, how do you know what it actually fixed? You might well have fixed a bug, but the bug you fix may not be the same as the one that was reported.
Rob
@Limbic: spot on! +1
Aaron
exactly the point I've been trying to make to him
webclimber
If I can guess right a *significant* % of the time and get clients going faster with less effort I think I'm going to get paid more than you...You can dislike this all you want, sometimes intuition *pays*This doesn't mean _never_ repro the fix, or indeed rarely. But always is an ugly word
ShuggyCoUk
+1. Always is an impossible word. There are times when it is impossible to reproduce things locally - are you really going to duplicate your clients site with *all* of the infrastructure and then reproduce the exact conditions? No. You can't guarantee that you have fixed the bug unless you've reproduced it.
MatthieuF
+1  A: 

While saying unit testing is the solution and you don't have to reproduce the bug if you use them sounds good, I don't think it works in the real world. The fact that they haven't reproduced the bug on site means reproduction is probably not trivial. This likely means the source is complicated. If the issue were trivial to represent in a unit test, that unit test would then be a reproduction of the bug. If they haven't reproduced it, they probably can't write a simple unit test for it.

If you haven't reproduced the bug, you don't truly understand it. You may have found a bug in the system, but that doesn't guarantee you have found the bug the customer was seeing.

If you can't reproduce the bug, you can't be sure it is really fixed. You may be treating only a symptom of the issue and not the core issue. The customer may have a different vector to the same core issue and their issue won't be fixed.

Steve Rowe
+1  A: 

I find the idea of not reproducing a bug to fix it nothing short of bizarre.

After all, you can't be certain that something is the problem until you can reproduce it. Up until then it's simply conjecture (even if it's well-informed, it's still conjecture).

Now wheter you unit test or not I think is irrelevant to this discussion since reproducability is simply a question of have you found the bug?. A unit test won't help you if you're testing for the wrong thing, working on some incorrect assumptions or simply have misunderstood the (possibly vague) bug report about what is actually happening.

That's why you reproduce the problem.

cletus
+1  A: 

yes, when possible/feasible; see Regression Testing

Steven A. Lowe
+6  A: 

The nature of the bug is also in question:

  1. Can the defect be reproduced quickly with a unit test? If so, the developer working on the fix should write said unit test, integrating it into a fuller regression suite. This ensures future work doesn't reintroduce the defect.

  2. Can the defect be reproduced programatically (e.g. with a Perl or Python script)? If so, the QA team should probably have an automated test case which covers it, and can quickly be used for verification.

  3. Can the defect be reproduced through a set of step-by-step instructions? If so, the QA team (or whoever initially flagged the bug) should provide the absolute minimal number of steps for reproduction.

These three principles greatly help to drive the QA process on my test team.

It is typically in the developer's best interest to reproduce the defect themselves. If this isn't possible (e.g. scarcity of hardware resources), working directly with QA (or whoever initially flagged the bug) would be the next best thing. Though some defects are obvious enough (e.g. a spelling error in a menu option), others can have greater consequences which might be caught by a developer who knows the underlying state of the application.

It is also helpful to have developers reproduce defects to confirm they are indeed defects. Testers, though indispensable (speaking as a software tester ;) ), often lack in-depth knowledge of the application code, and might not have a solid understanding as to why behavior didn't conform to their expectations (or specifications).

bedwyr
+1  A: 

I read a good book on debugging a while back that recommended not only reproducing the bug, but also, after fixing it:

  • taking the fix back out and seeing if the bug acts up again
  • putting the fix back in and seeing if the bug gets fixed again

The reason being that you may flail about quite a bit trying to get a bug fixed, and this will help you know you fixed it. If you don't know you fixed it, it might still be broken.

(I ran into a case like this the other day where a line of code that seemed to have no relation was causing our web site to behave strangely: When I took the line of code out, the site worked, but when I put it in, the site was OK. But it turned out that it was just coincidence; the site just worked wrong every other time no matter what I did, because it was subtly logging me in and out. You have to know that your fix is a real fix.)

EDIT: I worked at a place where the owner got very cranky about bugs that had been claimed to be fixed but weren't. This happened because even though I (or another dev) was able to reproduce the bug, I hadn't reproduced it the exact same way the tester had produced it. Not only should you reproduce the bug, but you should ensure that you're reproducing the bug the exact same way the tester did.

(But if you want to try on your own first, go ahead; maybe you'll find another bug! After that, have the tester demonstrate his/her reproduction before you start working on the bug.)

Kyralessa
+6  A: 

The only case where you shouldn't reproduce it locally is when that's infeasible. Maybe it needs a LOT of data, proprietary data, hardware, or maybe it requires a more complex setup than you have.

I have had to do it a few times because we couldn't duplicate the setup in-house. (Some of the development work had even been done on the client's system.) The client knew from the start that they were going beyond what we could reproduce and that bugs that this kicked up were going to be a pain to deal with.

Loren Pechtel
+1  A: 

In most cases, you can't know that the bug you've reproduced is the same as the bug your customer is experiencing. Because of that, reproduction is no guarantee the issue is really fixed.

For example, customers experienced a communication bug with a project I worked on. We hunted through the code to fix any possible bugs. One bug involved an ASIC bug in the PIC microprocessor, and the chip errata provided a workaround which we implemented. Then one of my coworkers went through extreme measures to reproduce the bug so that he could claim our customer's problem was probably fixed.

It wasn't.

In the end, we added a lot of defensive coding and the customer never experienced the problem again. We're not sure exactly which fix solved the issue, and it would be wonderful to know and reproduce the issue our customers were having, but at the end of the day none of that matters. All that matters is the software works.

So it's great to be able to reproduce a bug and get a deep understanding of what causes a problem, but reproducing a bug is no guarantee of solving a customer's problem, and working software is more important.

It sounds like your workplace is too far on the "reproduce nothing" end of the spectrum, and that's a shame when a little extra work can produce much more reliable software. In this post I've argued that the other end of the spectrum isn't that great either, so I would try to push for something in between.

Paul
A: 

For random crashes from the field, sometimes you just can't reproduce them. I've debugged rare race conditions and "impossible" crashes using postmortem crashdumps collected by Microsoft's Windows Error Reporting service. I think Microsoft guru Raymond Chen calls this "psychic debugging".

If your company delivers Windows software (C++ or .NET) to the public, I highly recommend using Microsoft's Windows Error Reporting (WER) service. It's free, even for commercial software companies. It's simpler than writing your own error reporting and crashdump collecting service. WER even collates all the stack traces so you know which crashes are duplicates, which is helpful when you don't feel like debugging thousands of duplicate crashes. :)

cpeterso
+2  A: 

You must always TRY. You need to identify if it is a bug, error or defect. How can you proced to fix or mitigate an issue with out understanding it?

How will it be resolved for the record or test suite? Is the appropriate status
- "Fixed", "Open", "Cannot Reproduce", "Will not Fix", "Feature"?

If you cannot reproduce it a desicion still must be made, letting it drift off into /dev/null is asking for it to come back and bite you.

Paxic
+2  A: 

There are a lot of good answers to this question, but most of them are not speaking directly to the economics of the situation, which is perhaps the main thing that may influence your manager.

If you do not repro, then you

  • save time/effort now, but
  • incur risk

Weighing the trade-offs between incurring effort to reduce risk depends a lot on the situation - what kind of software you're producing, how many customers, how 'bad' bugs are in production (e.g. does the sales site go down and you lose a million dollars a day, or does the web forum go down and people have to wait until tomorrow to online-chat), how much test automation you have (affects how much effort to repro/regress bugs), ...

Many (most?) managers are risk-averse, so when it comes to persuasion, try to spell out the various risks (regressing other features, deploying a fix that does not actually fix it (and looking foolish to customers), ...) as well as estimating the effort you need to expend to mitigate that risk (cost to repro and have confidence in fix).

Brian
Good points. Also keep in mind that it's your job to present the case to your manager, and his responsibility to balance the risk versus the reward. He may (will) make some bad decisions, but he's still the manager. Make your case as well as you can.
Don Branson
A: 

Try to get your manager fired because of it's stupidity and get his job, it might be an opportunity !

Fred
+3  A: 

Are you in a hurry? If so, then reproduce the errors! This creates a methodical mindset that you need to have to prevent further issues. How else could you possibly uncover old errors being re-introduced? Reproducing errors will catch other's attention:

Good Manager: "What are you doing?"

Good Developer: "We got that error again. I'm rebuilding the test environment so we can replay it."

Good Manager: "WTF? I thought Dave said he fixed that? Hey Dave, didn't you fix that error? Why are we wasting our time again with this?

Versus:

Ass-hat Manager: "What are you doing?"

Good Developer: "We got that error again. I'm rebuilding the test environment so we can replay it."

Ass-hat Manager: "No - we don't have time. Just fix it and roll it back. Did you respond to the email about the meeting? I gotta go - I gotta 10 with Steve. Good. Ah, roll that back, and ah send that email."

As a director, I tell all my teams to "take the time today" to get it right, so we won't be up at midnight.

David Robbins
Hey you know my manager ? ;)
webclimber
I've worked for people like that and it sucks! As a director now, I want to avoid the environments that I hated, and hopefully people will be productive for me. The folks I report to kinda hate it, but I'm a big boy and my team delivers in a big way.
David Robbins
A: 

Possibly repeating what has already been written, but YES!!! The rationale is: only find a bug once. If you can write a test which replicates the bug, you will not have to manually test for it again - your automated test will find it.
So then you fix the bug, and you can be safe in the knowledge that your software will never suffer from this bug again. This is good. It will save you time. It will save you money.

David_001
A: 

I've argued with my manager that we should reproduce the bugs locally to ensure the fix works, and more importantly so developers and QA can include the check for these cases as part of the regular release.

I especially agree with the part about developers and QA including checks. This is a part of what Japanese call Jidoka (which stands for "automation with a human touch"):

[...] Autonomation prevents the production of defective products, eliminates overproduction and focuses attention on understanding the problem and ensuring that it never recurs. It is a quality control process that applies the following four principles:

  1. Detect the abnormality.
  2. Stop.
  3. Fix or correct the immediate condition.
  4. Investigate the root cause and install a countermeasure.

The countermeasure is there to prevent recurrence. This is how you build a Poka-Yoke - mistake-proof - process (and a process that lets defects occur is a defective process). Good managers should get that.

Pascal Thivent