views:

532

answers:

15

What bugs did you find in your / others programs that taught you much about programming? Did you find bugs that opened surprising insights for you? Was there a bug that changed the way you think about programming?

+4  A: 

The only thing I have taken away from years of debugging OS problems is that: every situation is unique. There are no certainties. Just because hello world compiles doesn't mean it will work on your hardware. Or even the runtime.

I have learned:

  • Sometimes time is negative.
  • Some sanity checks were insane to begin with.
  • Intermittent failures are the most difficult to solve but also the most rewarding.
  • Sometimes the spec is the bug, but you are told to implement it anyway.
tkotitan
+1  A: 

I'd love to say it was some mind-smashing logical error that took me ages to figure out, but the only bug that really sticks out in my mind is a bunch of strange errors I kept having in my Python programs. They made no sense, and I ended up copying and pasting, line by line, figuring it was some weird file corruption or other I couldn't see. It didn't make sense otherwise. And that worked.
I found out a few months later of other people having similar problems. Turns out I just screwed up my whitespace (tabs vs. spaces). Oops.

The thing is, it didn't teach me much. Yeah, syntax is important, etc., but it taught me little. It's just that all the rest of the bugs taught me less. As a whole, my bugs taught me a hell of a lot, but no single bug stands out as teaching me anything of significant value. It's more of a cumulative effect.

Devin Jeanpierre
+2  A: 

Validation of input (fields)

Always those errors that you don't think of. (Language settings/currency/special characters)

PoweRoy
+7  A: 

Thou shalt not use float as a data type to store money values.

In C at the time, you get 6 digits of precision in a float. That was not enough for the banking application we were doing in college.

I started researching the issue, and learned about mantissa and exponent, double, and long double, and packed representations.

Computer float is not the same as a mathmatical float.

Using == to compare a float to a floating point constant might not work. There is a epsilon value that is defined to make it possible to compare them.

3.14 is a double 3.14F is a float.

Comparing a float and a double results in the promotion of the float to a double.

It was an epiphany moment for me.

EvilTeach
+1 for something which is unfortunately still very common :-(.
sleske
+1  A: 

Why an ISAPI dll written in Delphi performed noticablliy slower over time. After a lot of research it turned out to be caused by adress space fragmentation, which was caused by the default Borland memory manager.

Replacing it with a commercial alternative made the software more stable in the long run, and much more performant (no global locking on memory allocation).

oɔɯǝɹ
+3  A: 

Bugfixing in general has taught me the importance of having comprehensive error handling and error logging mechanisms in place.

Things like:

  • When a file load fails, log the name of the file that you were looking for
  • When reading data from a third party fails, log as much as you can about the data as well as the request you made that returned it (as this is the first thing that the third party support guy will probably ask you for!)

We have one application that aggregates data on a continual basis and formats it in an easily queryable fashion. The database actually contains more logging data than application data - and during development and when troubleshooting issues we always end up sifting through the log records to figure out what was going on, and going wrong.

Richard Ev
+1 for extensive logging when importing (potentially messy) data.
sleske
+6  A: 

Fixing bugs in my own code made me more tolerant of bugs in others code :)

Gulzar
+4  A: 

There is no assumption that is too basic to question it

My surprising insight: malloc() does not change the size of an array cumulatively. That is, if you call malloc(n * sizeof(something)) twice on the same pointer, you won't have twice as much memory. This might sound trivial to you, but since my program did run perfectly despite this really dumb error, I never questioned my knowing the functioning of malloc(). It literally took me weeks to get desperate enough to question this basic assumptions.

BastiBechtold
Do you mean realloc? malloc does not take a pointer AFAIK...
Hosam Aly
No, I mean malloc. I know this was really dumb, but that is just my point: You would never doubt your knowledge of something as basic as `malloc()`. But you should, especially when you don't have that much experience.
BastiBechtold
+3  A: 

Early in my programming days, I wrote a Javascript that did something (can't remember what) with an array of arrays.

function doTable(table){
    for (i = 0; i < table.length; i++)
        doRow(table[i]);
}

function doRow(row){
    for (i = 0; i < row.length; i++)
        doCell(row[i]);
}

From that day on, I've always been very careful with declarations, scoping, etc :)

gustafc
So where is the bug?
Hosam Aly
Since "i" is never declared, it is automatically globally scoped, so both loops use the same iteration variable. The correct version is "for (var i = 0; ..."
gustafc
"Globally" scoped!!!
Hosam Aly
+3  A: 

In my first job, in 1982, working on some accounting software written in BASIC (running on an 8bit CP/M machine), I ran into an intermittent bug which only occurred after users had been in an unrelated part of the software.

It took days with the users to work out that the bug was related to their usage pattern in such a weird way.

I eventually found a comment in source that went something like Contrary to the documentation, the sixth record in file X is actually in use because we use it as scratch space.

Since the time that lazy idiot had written that code, with that comment, that sixth config record had become officially used by the module I was trying to debug.

That taught me about

  • talking to users to work out the real circumstances behind crashes
  • how software modules can be coupled by things which are outside the code
  • confidence (I was 18) in my skills vs much older, more experienced
    colleagues
  • distrust
  • how to swallow outrage
Andy Dent
+1  A: 

It's always the last bug that teaches you the most. :)

Quog
+1  A: 

Always read the codebase twice before making changes.

When one arrives on a new project which a large pre-existing codebase, it's easy to look at some piece of that code and wonder how the guy who wrote it couldn't see what a huge idiot he was, and start cranking out bugfixes and enhancements. Of course, the risk of destabilizing the codebase in such cases are extreme.

I made precisely this error in a previous project, which was ~150,000 LOC and mostly a huge mess. But it worked... for the most part. Early in my time during this project, I made a number of fixes and some enhancements to the build process as well. Many of these were really (seemingly) minor things, and in one case, I moved a variable initializer outside of a loop, from something like this:

mSomeVar = 3;
for(int i = 0; i < reallyBigNumber; i++) {
    mSomeVar=3;
    // stuff
}

I read through the code, but couldn't see how someVar needed to be set every time, so I removed the code. Of course, some more distant part of the code depended on mSomeVar being 3 (or whatever it was supposed to be) when it started doing work, and this introduced a very subtle but noticeable difference in the program's behavior which wasn't fully noticed until after two point releases. How embarrassing. =/

In this case, I did read the codebase, but only once. I thought I knew what was going on, and started hacking away -- bad. I also started my first set of changes by micro-optimizing -- also bad (even though the above code was in a very performance-critical loop). Also bad was the fact that my teammates weren't using SCM before, and were new to working with it, and there was no test suite of any kind in the project.

So when I say I should have read the code twice, and done the following:

  1. The first time is literally "read-only" -- just read the code with an open mind
  2. The second time, you read it and write the test cases at the same time, if the project doesn't have any. If it does, then read those.

Only then would I feel safe enough to start making changes to the codebase. Otherwise, the risk of destabilizing an already unstable codebase is simply too great.

Nik Reiman
The idea that a single developer could read and *understand* the entire codebase is kind of ridiculous. ... and the fact that you might need to says terrifying things about the architecture of such a project.
Aaron Maenpaa
That's true. And even after reading the entire codebase, you still won't have any idea how the entire system works. But at least you have a better feel for the project layout, and can identify areas of the code which are incomplete or need work.
Nik Reiman
A: 

Debugging my first OS crash dump.

On Solaris, in particular using Solaris Crash Analysis Tool (Scat). Examining the details of a system can require considerable skill - I learnt a whole lot

Aaron
A: 

Two that I can think of:

My first hard bug; I don't remember the details but it was in a Fortran program when I was very young. It taught me that I wasn't as smart as I thought I was and that, regardless of any talent I might have, building something valuable required patience and attention to detail.

The first bug I could run through a debugger. The debugger was a revelation to me: seeing the architecture of my solution laid out in a core dump that I could step through (again, we're talking old school) was almost scary - but empowering at the same time. Later, when I took a course in assembly, I understood things much better but that first debugger walk through taught me an enormous amount about the complexities of what goes on under the hood.

Mark Brittingham
+1  A: 

I learned to not give up until you've walked the code in the debugger, from one amazing MacOS crash.

It was in the early days of my OOFILE report writer and we had a crash in a new feature which drew lines around columnar sections of the report. I hadn't written the new feature and the young guy who did had been trying to debug the hard machine crash for at least a day. It only crashed when printing and it took the machine down hard.

I was walking the code literally step by step in the debugger, watching the preview window, when I realised that a line segment I had seen drawn a few steps ago was being redrawn on top of itself. When I looked closely at the code, he'd taken a seemingly reasonable shortcut and avoided complex conditional logic that determined if a line segment had already been drawn. It appeared to make no difference, drawing a line twice, but by this stage I was desperately looking for anything at all unusual.

Yep - that turned out to be the bug. Something in the way the Postscript laser printer converted Quickdraw ops into a Postscript program couldn't cope with exactly the same line segment being drawn again on top of itself and was going into some endless death spiral.

Andy Dent