tags:

views:

276

answers:

8

You know the ones that happens often enough where they need to be fixed, but no one can really reproduce?

How do you nail those?

+2  A: 

Hi!

Simply put traces in your code that writes in a file, for example. Putting the "stack trace" is also a good way to know the sequence of event.

vIceBerg
+1  A: 

That's a huge question. It really depends on the bug. However, I've found that logging really helps when all else fails. Including debugging symbols when possible helps too.

What kind of bug are you trying to figure out?

awhite
+4  A: 

If you know the area of code where it's happening, I find it helpful to trace back from the error and see what's going on. You play a "what if?" game. Start by assuming the error condition is happening and examine code "backwards" asking yourself how that code could produce the error condition. This will lead itself to another condition, and you begin the process again, going up the call stack.

I find this is incredibly informative at finding bugs and understanding the codebase (as opposed to stepping through with a debugger and some watches).

davetron5000
A: 

Be persistent. make things things the process/occur on rare ocasions, make them happen more often. gather as many clues as you can as to the conditions present in the system that lead to the failure.

MikeJ
A: 

To write a log and utilize stack traces (as @vIceBerg suggests) is a good idea.

However, it sounds like you have a different kind of issue. I'm assuming you are on top of your game, you are not a beginner programmer and I don't mean to lecture you, but code shouldn't be that complex that you cannot nail a bug.

In most cases steps to reproduce the issue help me solve any issue right away, and those steps should be as detailed as possible (of course). Also it requires a lot of effort to teach people the skills of bug reporting, at the beginning you start with "it doesn't work".

If you still cannot find the bug, it may be time to refactor the codebase you are working with.

Till
Oh, sure it can be hard. Even well-structured code can run into weird issues. Some types of bugs are just hard to reproduce. Do the same thing 100 times and you only get the bug once. It really depends on the kind of app you're working on.
Herms
A: 

That really depends on what kind of issue it is.

Try to get as much information as you can from the times it does get reproduced. Steps that lead up to the crash, anything unusual in the way the person used the app, their environment, etc. It's easier to get this information from an internal QA person if they can reproduce it, but try to get it from clients as well if you can. Sometimes there's a specific, though rare, combination of things that are done that can consistently reproduce it. If you can narrow down what happens to cause it then you may be able to get to a point where you can start reproducing it regularly yourself.

One of the first things that I usually do is code review the section of the app that's having the problem. We had an issue a while ago where occasionally a grid component would randomly stop displaying entries properly. So I started by looking at the code that handled that specifically, and branched out from there to anything that could affect that code. Sometimes a careful code review (by you or someone else) can track down the issue. Sometimes you'll even find other bugs you didn't know about and get to fix those.

If it's an application crash then adding some crash handling/reporting to your app might be worth while. You could do something similar to what Microsoft does (what I call the "Do Not Send" dialog). Firefox has a similar crash reporting tool. I'm not sure what would be involved in writing your own, but I think some companies may provide products for it. This would give you the app state as-of the crash, which can help in tracking down exactly what's breaking.

Herms
A: 

Eventually someone reproduces it. I tell the users that if they find a weird crash bug or something, don't write it off as coincidence--grab a debug build and run that on the source, and backtrace when it crashes. Or, if they aren't technical enough to do that, simply try to reduce it to the smallest possible test case and send the input and settings to me for analysis.

Almost every single time this has resulted in a successful finding of the bug and fixing of it. Usually it takes less than 5 minutes to fix once I have the backtrace; on the other hand, it might take days or even weeks to get that backtrace.

I've found some extremely subtle and rare bugs this way--things that happen every ten million input frames or more--things that are often impossible to detect in any sort of QA. In such a case, the best you can do is rely on a large userbase to find the problems before its too late.

Dark Shikari
+1  A: 

Great question!

When it's your job to fix the error, you can't wait for someone else to reproduce it

You also can't rely on luck.

Simply putting traces in your code won't do anything either, unless you can reproduce the bug.

You can set up a robot that performs multiple trials. But what do you try? How do you break it down?

Logs with stack traces may help you locate where in the program the error occurs.

But it's not just about where the error occurs. Usually this kind of bug isn't completely random, but occurs (apparently randomly) when you try a certain thing — import, download, page view, whatever. So you generally have some idea where to start.

It's as much about reproducing the conditions that lead to the error, as it is about locating it. After all, it happens sometimes, and not others. The code isn't changing, but the conditions are.

But still, How do you fix it?

Unfortunately, it depends on the bug.

You need patience, persistence, intimacy with the program, and a certain amount of programmer's intuition.

"Intuition" is not luck. What is it? At the risk of a circular definition, it's what a programmer uses when solving problems.

  • The process of elimination.
  • Inference based on evidence.
  • Sneakin' suspicion about about something in there.

Start with that sneakin' suspicion. Like, I bet there's some invisible configuration difference on the server. Or, I kinda thought in the back of my mind there might be a problem if...

Eliminate those possibilities. Run the operation until you experience an "aha!" moment. Try that out. Repeat as necessary.

harpo