views:

349

answers:

11

Maintaining unit tests is difficult. I am sure that we all have experienced a time when a seemingly small change to the system under test caused dozens of unit tests to fail. Sometimes these failures reveal bugs in the SUT, but often the tests are out of date and no longer reflect the correct behavior of the SUT. In these cases, it is necessary to fix the broken tests.

Have you encountered this situation? Does it happen often? What change did you introduce and how did the failures manifest? Did you fix the broken tests or simply delete them? If the former, how? If the latter, why? How does the fear of failures affect your desire to write tests?

I would also like to find specific examples of broken tests. Do you know of any open-source applications that evolved in ways that caused tests to fail?

+1  A: 

Personally I don't think it is avoidable. You can minimize the effects through isolation of effect but that can be quite hard at times. Mocks can help, but even they can be hard to work with at times. If behavior changes though, and the behavior change was intentional and you have Y tests that are dependent on the behavior then it only makes sense that you will have to change all Y expectations. I have found that by doing a bit of OOP, or just proper suite design you can at times be saved somewhat through the advantage of code re-use. I am never afraid of test failures in this regard, if behavior needs to change (and you've put thought into that need, the need is real and the need is not the whim of your manager who used to write COBOL in the good ole' days :-) then it is just part of evolving the code base and should be considered part of the work to be done. If the re-factor of the test suite takes a long time, you can exclude the tests from your build and re-include them one-by-one, but tests that still test an expected behavior should not be deleted but re-factored. Lets not let the test suite erode to get in a new feature, if at all avoidable.

harschware
+2  A: 

I've observed, and certainly read somewhere, here or here, that unit-tests that test the implementation are more brittle that tests that test the behavior of the code. Or, white box unit-tests are more brittle than black box unit-tests. For instance, a test of a class that stores things directly peeks into the object data members to verify if things are stored, will be broken when the storage implementation changes.

philippe
you have that backwards. Black-box testing is more brittle. you described it correctly, but transposed the names I think.
tster
yes, missed that. I fixed it, thanks.
philippe
+7  A: 

Isn't it the whole point of unit tests to tell you when you've broken your code unexpectedly ? I would never delete a failing test unless it was exercising code that was going to be removed from my system; you have to consider test creation and maintenance as an integral part of writing your software, just as important as the delivered code.

gareth_bowles
+6  A: 

How does the fear of failures affect your desire to write tests?

The fear of failure is what drives my desire to write tests. A test suite gives me immediate feedback on whether my last change did break anything, and what it broke. Fear is to change your code, and have no idea whether things work or not.

Mathias
+3  A: 

I am sure that we all have experienced a time when a seemingly small change to the system under test caused dozens of unit tests to fail.

Here's your trouble. The problem isn't maintaining tests, the problem is you have a brittle codebase. If one change causes dozens of failures you either have a codebase with a lot of fragile couplings, or a test suite with lots of pseudo-integration tests that are testing too much.

If one change is breaking dozens of tests then you probably need to take some spike time to refactor the heavily coupled sections of your code OR you need to break out your tests and eliminate duplicated test conditions.

Dave Sims
+1  A: 

The problem you describe is something that is best addressed at the start of the project. From the moment you start writing your unit-tests (preferably TDD-style), you should be constantly aware of the way the unfold and evolve.
There's no magic trick here, you have to think hard and notice when you're creating a tangled mess that will in fact make it harder to maintain the tests. Usually, if the tests are so tangled, it implies some of the production code could use some refactoring too.

Basically, refactor constantly in order to make sure changes are localized, and that you won't have to change 200 test cases simply because you decided to make a simple change. One would say that your code should be granular enough to allow easy testing (no need to stub ten objects before calling the code and testing it), which will usually make it easier to maintain in a few months.

The whole purpose of your tests is to give you confidence. More than that, they allow you to be a little stupid. If you've got enough tests that you're sure of, you don't have to think hard before removing a few lines of code you're pretty sure aren't used. Delete them and run the tests!
Because I think this is one of the main strengths of unit-testing, I would do everything I could to prevent the test suite from being something I "fear". If simple changes break tests in places you'd never expect them too, it usually means you need to learn some more about your code base, and probably make some changes to tidy things up.

There are very few cases where I deleted a test case the tested code that wasn't deleted. It means that even after digging really deep (going back in the version control system) I can't understand why the test is there, or don't agree with it. But my first way of looking at it is "damn, I'm missing something". That's how sure one should be with his suite, and that's why it's worth the effort.

abyx
A: 

If many of your tests break when a small change is made to the software, then your tests aren't written very robustly. It's important to write the tests to verify the behavior of the code, rather than the implementation. You also need to make sure each test is as independent as possible.

In many cases, the changes necessary to make the code easier to test will have the side-effect of improving the quality of the code - by helping to define the interface between components, for example.

Mark Bessey
+5  A: 

Maintaining unit tests is difficult.

True enough. The co-evolution of production code and test code together is hard. Both kinds of code shares similiarities (e.g. naming convention), but they still differ in nature. For instance the DRY can be violated in test code if necessary; code duplication would indeed be found easily as test would break in both place. The tension between test and production code results sometimes in specific design trade-off (e.g. dependency inversion) to ease testability. Again, these tensions are relatively new, and the relationship between the design on production code and the effort in maintenance is not well understood. The article "On the interplay between software testing and evolution" is great (I was not able to find it in PDF, but didn't google for a long time).

I am sure that we all have experienced a time when a seemingly small change to the system under test caused dozens of unit tests to fail. Sometimes these failures reveal bugs in the SUT, but often the tests are out of date and no longer reflect the correct behavior of the SUT. In these cases, it is necessary to fix the broken tests.

Defect localization - the ability of a test suite to pinpoint defect precisely - is also only partially understood. What are the best strategies to design test suites resulting in high defect localization is not clear. Most tests have some overlap between them, which cause low defect localization. Ordering tests so that they depend on each other improves this aspect, but at the same time goes against the principle of having isolated tests. We see a growing awareness of such tensions, but there is not definitive solution to address these issues. Here is an article about exploiting dependencies between tests.

The problem of outdated or irrelevant tests (those who don't cover anything at the end) is also growing awareness. Test coverage is not enough and high quality test suite require experience, or at least, some education. See this article about the 100% coverage myth.

How does the fear of failures affect your desire to write tests?

You have to find a balance between (1) initial time invested in test suite (2) effort in maintenance and (3) effectiveness of the test suite. I write mostly what I call "inflection point tests" and here my view on the subject.

ewernli
Excellent insight and citations. Exactly the kind of answer I'm looking for.
Brett Daniel
A: 
Dave
A: 

This problem is known as Unit Test fragility. More often than not is caused by the coupling between your tests and the class under test is too high.

You should treat your test as a client of a piece of code in the same way that other code is a client of it. The test should have certain expectations of how the code works, but it shouldn't have to know the details. If you need to know too much about how a class works to test it then there probably needs to be another layer of abstraction between you and the code you are testing.

Feel free to create abstraction layers just for testing. Test code is as important as production code and should be as well written and designed as anything else. Sometimes a layer of indirection can make your tests less fragile and mean that when something breaks the change is only in one place.

Dean Povey
A: 

Here is a thought experiment. It only looks like a rant

Try not maintaining your unit tests instead.

Then tell some engineer from a different discipline that you've let your test plans fall by the wayside. That your changes are now only tested when the whole thing gets put together.

(By the way this test regimen has a name: "Big bang integration" : when you try it in Electronics it's literally a bang as all the equipment catches fire.)

Explain how much work it was to maintain, and that not doing it saved time.

Watch the look on their face. For best effect, pick someone who is a licensed PE (Professional Engineer) in your locality.

Repeat this, but substitute the person from your ISO 9000 auditor, where you show them the revised development procedure that has no testing till integration.

Repeat, again, substitute a CMMI assessor. This last one will be funny. CMMI SCAMPI assessors love explanations like "it was too much effort"

Now, you're probably thinking that you don't work somewhere with ISO9000 or CMMI, and never work with other disipline engineers.

At which point, the problem is that an average software shop probably does not unit test at all.

So accept average-ness, and use established industry worst-practices. Then offshoring to <insert country cheaper than where you live> cannot have any quality impact either (there's no process worse). So, really, your boss should off-shore the work now and save money. There's something wrong with this reasoning somewhere.

Tim Williscroft