tags:

views:

92

answers:

4

I'm currently working on a project with some fairly significant business rules where the problem space is being 'discovered' as we are writing the solution (pretty typical chaotic project management kind of thing). We have decent test coverage & rely on them quite a bit to make sure our signficant changes don't blow anything up. This scenario is the kind of thing that unit test zealots highlight as a prime example of testing helping software be at once easily modified with less defects and quicker completion then if you aren't using unit tests. I shudder to think about how I'd be able to cope without the test suite.

My question is that while I certainly believe in the value of unit testing (this project is actually TDD but it's not really germane to the question), I'm wondering, as have others, about the classic unit test problem of having so much more code to grok and maintain (i.e. the tests themselves). Again. there is no doubt in my mind that this particular project is much better off with the unit test cruft then without it, I'm also concerned about the long term maintainability of the tests.

There are a few techniques that I've used following the advice of others to help with this problem. In general,

  1. we create test lists that are either in a 'dependent' vs 'independent' bucket. independent tests don't require anything that is not in source control. So any calls to our data access layer are either mocked or getting data from an xml file instead of the real db for example. Dependent tests as the name suggest depend on something like a config file or a db or a networkie thing that might not be correctly configured/available when running the test. Splitting the tests into to groups like this has been extremembly valuable in allowing us to write dependent 'throw away' tests for early development and independent mission critical tests that can be relied upon and reist test rot. It also makes the CI server easy to manage since it doesn't have to be setup and maintained w/ db connections and the like.
  2. We target different levels of our code. For example, we have tests hitting the 'main' and tests hitting all of the methods that 'main' would call. This gives us the ability to target details of the system and the overarching goals. The 'main' tests are difficult to debug if they break, but they typically aren't the only thing that breaks (detailed tests also break). It's easier to follow the detailed tests and debug them if they break but they are insufficient to know if a refactor kills the system (that's what the 'main' tests are for).
  3. The 'main' tests have been critical to feeling comfortable that a refactor hasn't hosed the codebase. so a 'main' test would be like many tests to a single method called with different args that map to use cases. It's basically the entrypoint into our code at the highest level and as such are arguably not really 'unit' tests. However, I find that I really need the higher level tests in order to feel comfortable that a refactor didn't blow up the codebase. The lower level tests (the ones that are truly 'unit' of work) are not sufficient.

All this to get to the question. As the project moves forward and I find I need to implement changes (sometimes quite significant, sometimes trivial) to the codebase, I find that when changes cause tests to fail, there is ratio between the test failing and actual regressive business logic failure and unit test invalidity. In other words, sometimes the test failure is because of a regression bug in the actual codebase and sometimes it is because the unit test assertions are not valid any longer and it is the assertions that need to be changed. It seems roughly that when tests fail it's been at about even (50%) for this particular project.

Has anyone tracked this ratio on their projects, and if so, what kinds of things have you learned (if any) regarding this ratio? I'm not sure that it even indicates anything, but I have noticed that about half the time that test failures lead me to adjusting test asserts rather than actually fixing regression bugs in the real codebase. Whenever this occurs, it makes me feel like I just wasted x hours of my day & I wonder if I could be more efficient somehow with my testing approach. It often takes longer to resolve test-assert failures than actual regression bugs which is both counterintuative and frustrating.

EDIT Note this question is about exploring what this ratio means and your experience with this ratio. When is it 'smelly'??

+4  A: 

"test failures lead me to adjusting test asserts rather than actually fixing regression bugs in the real codebase."

Correct. Your requirements changed. Your test assertions must change.

"it makes me feel like I just wasted x hours of my day"

Why? How else are you going track requirements changes?

"It often takes longer to resolve test-assert failures than actual regression bugs"

No kidding. When your requirements are in a state of flux, it takes a lot of time and effort to map requirements changes to test result changes.

"which is ... counterintuative". Depends on your intuition. My intuition (after 18 months of TDD) is that requirements change lead to design changes, lots of complex test changes to reflect the design changes.

Sometimes, very little (or no) code changes.

If your code is really good, it doesn't change much. When you're spending more time on testing than code, it means you wrote good code.

Go home happy.

The code smell shows up when you're spending more time trying to get code to pass a set of tests which never change. Think about what that means. You wrote the tests, but you just can't get code to pass. That's horrible.

If you spend an hour writing tests and 4 hours trying to get code to pass the tests, you've either got a really complex algorithm (and should have broken it down into more testable pieces) or you're a terrible application programmer.

If you spend 1 hour writing tests and 1 hour getting code the pass the tests, that's good.

If you spend 2 hours fixing tests because of a requirements change, and 1 hour getting code to pass the revised tests, that means your code wasn't very resilient against change.

If you spend 2 hours fixing tests because of a requirements change, and 1/2 hour tweaking code to pass those tests, you've written some really good code.

S.Lott
A: 

S. Lott pretty much said it all. I think the only thing a ratio of test assertion changes (T) vs. regression fixes (R) will tell you is a combination of how volatile your requirements are (which will make T higher) versus how successful the application code is at passing the test (which will affect the value of R). Those two factors could vary independently according to the quality of your requirements and development processes.

gareth_bowles
+1  A: 

I definitely second @S.Lott's answer. I would just point out that what happens when the spec is established on piles of dead trees is that when requirements change, the dead trees (or word processor files) don't yell at you the way the tests do, so everything runs along just fine, except that you have this pile of dead trees that everyone looks at and says "the documentation is fiction."

That being said, there are cases where the tests are just not well written or useful, and probably should be dropped. I find that especially with TDD, where the tests teased out the design and were really incremental, and now that the design and functionality is further along some of those original tests are really not relevant anymore.

If you regard fixing a bunch of tests as "wasted x hours of my day" then when you move on to the second test you don't think about the first after it passes. That is going to make the cost of change higher. It is probably the correct decision, but there is nothing wrong with looking at a test and saying that it was overcome by subsequent events and just dropping it, just don't use that as a cheap way out.

Yishai
+2  A: 

I have noticed that about half the time that test failures lead me to adjusting test asserts rather than actually fixing regression bugs in the real codebase.

When a test fails, there are three options:

  1. the implementation is broken and should be fixed,
  2. the test is broken and should be fixed, or
  3. the test is not anymore needed (because of changed requirements) and should be removed.

It's important to identify correctly that which of those three options it is. The way that I write my tests, I document in the name of the test the behaviour which the test specifies/tests, so that when the test fails, I will easily find out why the test was originally written. I have written more about it here: http://blog.orfjackal.net/2010/02/three-styles-of-naming-tests.html

In your case,

  • if you need to change the tests because of changed requirements and only a couple of tests at a time need to be changed, then everything is normal (the tests are well isolated, so that each piece of behaviour is specified by only one test).
  • If you need to change the tests because of changed requirements and many tests at a time need to be changed, then it's a test smell that you have lots of tests testing the same thing (the tests are not well isolated). The tests might be testing more than one interesting behaviour. The solution is to write more focused tests and better decoupled code.
  • If the tests need to be changed when refactoring, then it's a test smell that the tests are too tightly coupled with implementation details. Try to write tests which are centered around the behaviour of the system, instead of its implementation. The article which I linked earlier should give you some ideas.

(An interesting side point is, that if you find yourself mostly rewriting classes, instead of changing them, when requirements change, then it can be an indication that the code is following well SRP, OCP and other design principles.)

Esko Luontola