I'm currently working on a project with some fairly significant business rules where the problem space is being 'discovered' as we are writing the solution (pretty typical chaotic project management kind of thing). We have decent test coverage & rely on them quite a bit to make sure our signficant changes don't blow anything up. This scenario is the kind of thing that unit test zealots highlight as a prime example of testing helping software be at once easily modified with less defects and quicker completion then if you aren't using unit tests. I shudder to think about how I'd be able to cope without the test suite.
My question is that while I certainly believe in the value of unit testing (this project is actually TDD but it's not really germane to the question), I'm wondering, as have others, about the classic unit test problem of having so much more code to grok and maintain (i.e. the tests themselves). Again. there is no doubt in my mind that this particular project is much better off with the unit test cruft then without it, I'm also concerned about the long term maintainability of the tests.
There are a few techniques that I've used following the advice of others to help with this problem. In general,
- we create test lists that are either in a 'dependent' vs 'independent' bucket. independent tests don't require anything that is not in source control. So any calls to our data access layer are either mocked or getting data from an xml file instead of the real db for example. Dependent tests as the name suggest depend on something like a config file or a db or a networkie thing that might not be correctly configured/available when running the test. Splitting the tests into to groups like this has been extremembly valuable in allowing us to write dependent 'throw away' tests for early development and independent mission critical tests that can be relied upon and reist test rot. It also makes the CI server easy to manage since it doesn't have to be setup and maintained w/ db connections and the like.
- We target different levels of our code. For example, we have tests hitting the 'main' and tests hitting all of the methods that 'main' would call. This gives us the ability to target details of the system and the overarching goals. The 'main' tests are difficult to debug if they break, but they typically aren't the only thing that breaks (detailed tests also break). It's easier to follow the detailed tests and debug them if they break but they are insufficient to know if a refactor kills the system (that's what the 'main' tests are for).
- The 'main' tests have been critical to feeling comfortable that a refactor hasn't hosed the codebase. so a 'main' test would be like many tests to a single method called with different args that map to use cases. It's basically the entrypoint into our code at the highest level and as such are arguably not really 'unit' tests. However, I find that I really need the higher level tests in order to feel comfortable that a refactor didn't blow up the codebase. The lower level tests (the ones that are truly 'unit' of work) are not sufficient.
All this to get to the question. As the project moves forward and I find I need to implement changes (sometimes quite significant, sometimes trivial) to the codebase, I find that when changes cause tests to fail, there is ratio between the test failing and actual regressive business logic failure and unit test invalidity. In other words, sometimes the test failure is because of a regression bug in the actual codebase and sometimes it is because the unit test assertions are not valid any longer and it is the assertions that need to be changed. It seems roughly that when tests fail it's been at about even (50%) for this particular project.
Has anyone tracked this ratio on their projects, and if so, what kinds of things have you learned (if any) regarding this ratio? I'm not sure that it even indicates anything, but I have noticed that about half the time that test failures lead me to adjusting test asserts rather than actually fixing regression bugs in the real codebase. Whenever this occurs, it makes me feel like I just wasted x hours of my day & I wonder if I could be more efficient somehow with my testing approach. It often takes longer to resolve test-assert failures than actual regression bugs which is both counterintuative and frustrating.
EDIT Note this question is about exploring what this ratio means and your experience with this ratio. When is it 'smelly'??