views:

250

answers:

8

During software development, there may be bugs in the codebase which are known issues. These bugs will cause the regression/unit tests to fail, if the tests have been written well.

There is constant debate in our teams about how failing tests should be managed:

  1. Comment out failing test cases with a REVISIT or TODO comment.

    • Advantage: We will always know when a new defect has been introduced, and not one we are already aware of.
    • Disadvantage: May forget to REVISIT the commented-out test case, meaning that the defect could slip through the cracks.
  2. Leave the test cases failing.

    • Advantage: Will not forget to fix the defects, as the script failures will constantly reminding you that a defect is present.
    • Disadvantage: Difficult to detect when a new defect is introduced, due to failure noise.

I'd like to explore what the best practices are in this regard. Personally, I think a tri-state solution is the best for determining whether a script is passing. For example when you run a script, you could see the following:

  • Percentage passed: 75%
  • Percentage failed (expected): 20%
  • Percentage failed (unexpected): 5%

You would basically mark any test cases which you expect to fail (due to some defect) with some metadata. This ensures you still see the failure result at the end of the test, but immediately know if there is a new failure which you weren't expecting. This appears to take the best parts of the 2 proposals above.

Does anyone have any best practices for managing this?

A: 

I tend to leave these in, with an Ignore attribute (this is using NUnit) - the test is mentioned in the test run output, so it's visible, hopefully meaning we won't forget it. Consider adding the issue/ticket ID in the "ignore" message. That way it will be resolved when the underlying problem is considered to be ripe - it'd be nice to fix failing tests right away, but sometimes small bugs have to wait until the time is right.

I've considered the Explicit attribute, which has the advantage of being able to be run without a recompile, but it doesn't take a "reason" argument, and in the version of NUnit we run, the test doesn't show up in the output as unrun.

Blair Conrad
+2  A: 

I would leave your test cases in. In my experience, commenting out code with something like

// TODO:  fix test case

is akin to doing:

// HAHA: you'll never revisit me

In all seriousness, as you get closer to shipping, the desire to revisit TODO's in code tends to fade, especially with things like unit tests because you are concentrating on fixing other parts of the code.

Leave the tests in perhaps with your "tri-state" solution. Howeveer, I would strongly encourage fixing those cases ASAP. My problem with constant reminders is that after people see them, they tend to gloss over them and say "oh yeah, we get those errors all the time..."

Case in point -- in some of our code, we have introduced the idea of "skippable asserts" -- asserts which are there to let you know there is a problem, but allow our testers to move past them on into the rest of the code. We've come to find out that QA started saying things like "oh yeah, we get that assert all the time and we were told it was skippable" and bugs didn't get reported.

I guess what I'm suggesting is that there is another alternative, which is to fix the bugs that your test cases find immediately. There may be practical reasons not to do so, but getting in that habit now could be more beneficial in the long run.

Mark
+1  A: 

We did the following: Put a hierarchy on the tests.

Example: You have to test 3 things.

  • Test the login (login, retrieve the user name, get the "last login date" or something familiar etc.)
  • Test the database retrieval (search for a given "schnitzelmitkartoffelsalat" - tag, search the latest tags)
  • Test web services (connect, get the version number, retrieve simple data, retrieve detailed data, change data)

Every testing point has subpoints, as stated in brackets. We split these hierarchical. Take the last example:

3. Connect to a web service
    ...
3.1. Get the version number
    ...
3.2. Data:
    3.2.1. Get the version number
    3.2.2. Retrieve simple data
    3.2.3. Retrieve detailed data
    3.2.4. Change data

If a point fails (while developing) give one exact error message. I.e. 3.2.2. failed. Then the testing unit will not execute the tests for 3.2.3. and 3.2.4. . This way you get one (exact) error message: "3.2.2 failed". Thus leaving the programmer to solve that problem (first) and not handle 3.2.3. and 3.2.4. because this would not work out.

That helped a lot to clarify the problem and to make clear what has to be done at first.

Georgi
Ordered heirachal unit testing is ftw.
TraumaPony
Note that this is not unit testing in the strict sense of the word, as unit tests are by definition independent of each other. What you are doing is integration testing. Still a good solution.
sleske
Well, technically spoken the tests keep independent from each other. They can (and should) be implemented without dependencies, but the execution itself and the interpretation of it are dependent on each other.
Georgi
A: 

You might also find this Q & A thread worthwhile reading.

Onorio Catenacci
A: 

I think you need a TODO watcher that produces the "TODO" comments from the code base. The TODO is your test metadata. It's one line in front of the known failure message and very easy to correlate.

TODO's are good. Use them. Actively management them by actually putting them into the backlog on a regular basis.

S.Lott
+2  A: 

I generally work in Perl and Perl's Test::* modules allow you to insert TODO blocks:

TODO: {
  local $TODO = "This has not been implemented yet."

  # Tests expected to fail go here
}

In the detailed output of the test run, the message in $TODO is appended to the pass/fail report for each test in the TODO block, so as to explain why it was expected to fail. For the summary of test results, all TODO tests are treated as having succeeded, but, if any actually return a successful result, the summary will also count those up and report the number of tests which unexpectedly succeeded.

My recommendation, then, would be to find a testing tool which has similar capabilities. (Or just use Perl for your testing, even if the code being tested is in another language...)

Dave Sherohman
See http://testanything.org for more about Perl's test framework, which has been extended to other languages.
cjm
Is a similar facility available in other test frameworks? It seems somewhat uncommon. (PHPUnit doesn't have it, for example.)
mjs
A: 

#5 on Joel's "12 Steps to Better Code" is fixing bugs before you write new code:

When you have a bug in your code that you see the first time you try to run it, you will be able to fix it in no time at all, because all the code is still fresh in your mind.

If you find a bug in some code that you wrote a few days ago, it will take you a while to hunt it down, but when you reread the code you wrote, you'll remember everything and you'll be able to fix the bug in a reasonable amount of time.

But if you find a bug in code that you wrote a few months ago, you'll probably have forgotten a lot of things about that code, and it's much harder to fix. By that time you may be fixing somebody else's code, and they may be in Aruba on vacation, in which case, fixing the bug is like science: you have to be slow, methodical, and meticulous, and you can't be sure how long it will take to discover the cure.

And if you find a bug in code that has already shipped, you're going to incur incredible expense getting it fixed.

But if you really want to ignore failing tests, use the [Ignore] attribute or its equivalent in whatever test framework you use. In MbUnit's HTML output, ignored tests are displayed in yellow, compared to the red of failing tests. This lets you easily notice a newly-failing test, but you won't lose track of the known-failing tests.

Zack Elan
+3  A: 

Fix the bug right away.

If it's too complex to do right away, it's probably too large a unit for unit testing.

Lose the unit test, and put the defect in your bug database. That way it has visibility, can be prioritized, etc.

slim
+1, because people don't tend to notice that "I had 25 failed tests, now I have 26". If it's either zero or non-zero, new bugs will be very noticeable. And bug trackers are made for keeping track of bugs :-)
Lev