In your practice, what measure(s) do you use to know when it's time to stop testing an application and move it to production?
At my work place one metric that is sometimes used is that we have tested enough when we start finding old and unreported bugs that have been present for the last several versions of our product. The idea is that if those are the bugs that we are finding during testing, and those bugs have been present for years without a customer complaining about them, then we are probably safe to ship.
Of course, you also have all of the manual tests, automated tests, making developers use the product, betas, and continual testing stuff as well, but using how many bugs we are finding now that have been present, but unreported, in past versions was a new idea to me when I first heard it.
First, you never stop testing. When you are done testing and you release it, all that means is that your users are testing instead of you.
Second, when your comprehensive test scripts pass with an acceptable level of failure you are ready to move on.
Finally, this is highly specific to your case. On some projects, we have a 3 week beta test period where lots of people hack on the system before the smallest of changes can be rolled out. In other areas (less critical), small changes can be moved out with just the nod of another developer.
When all of the major show stoppers have gone.
Seriously, you should do a User Acceptance Test and let your users use your system and find out if its all cut out for them. If this isn't practical, do a closed beta with select users that resemble your target audience.
It is impossible to really find all the bugs in your system, so sometimes the only real rule is to just ship it.
For projects in my organisation, the measures I typically use are as follows:
- No Severity 1 (show stopper) issues
- No Severity 2 (major functionality crippled) issues
- Acceptable number of Severity 3 (minor functionality) issues
"Acceptable number" is naturally a very squishy number, depending on the size of the application etc. etc.
Once those preconditions are met, I will have a meeting of all the stakeholders (QA lead, Dev lead, App Support lead etc.), and go through the list of outstanding issues, and make sure that there is agreement about the severity assigned to outstanding issues. Once I have confirmed that there are no outstanding Sev 1 and Sev 2 issues, I'll get "Go/No Go" calls from each stakeholder. If everyone says "Go", I'm comfortable moving to Production. If at least one stakeholder says "No Go", we examine the reasons for "No Go" and, if necessary, take steps to resolve the issues behind that.
In smaller projects the process may be more streamlined, and if it's just a one-person operation your set of pre-conditions may be much simpler, i.e. "the application provides a reasonable benefit, while having an (apparently) acceptable number of bugs - let's put it out there!". As long as the benefit provided by the application over-rides the annoyance of bugs, especially if you are following the "release early and often" guideline, that might work for you.
mharen,
I find that if you have comprehensive automated tests, it is utterly irresponsible to ship software unless all of them pass. Automated tests mean that these are areas that are either core functionality or bugs that have occurred in the past that you are aware of, and can fix so as to have a passing test. It will be irresponsible to ship software that doesn't pass 100% of its automated tests.
Jon,
I didn't mean test scripts to imply automated testing. I was referring to the more traditional approach of a step-by-step list of what to test and how to test it.
That said, I don't agree that all automated tests should be required to pass. It all depends on severity and priority. On a large project, we can have devs writing tests that fail based on issues reported by users. Since we can't fix every bug with every release, it's a given that some tests simply won't pass.
One interesting test methodology that I have always wanted to try is 'error seeding'. The idea is that you have a person insert intentional bugs into the system that fall into different categories.
For example:
- Cosmetic, spelling errors, etc
- Non critical errors
- Critical errors and crashes
- Data problems. No errors occur, but something deeper is wrong with the results.
- etc.
The 'seeder' documents exactly what was changed to insert these bugs so they can be quickly reverted. As the test team finds the seeded bugs, they are also finding real bugs, but do not know the difference. In theory, if the test team finds 90% of the seeded critical errors, then they have probably found a proportionate number of the real critical errors.
From these statistics, you can start making judgment calls about when it is acceptable to have a release. Of course this would not be even close to foolproof due to the random nature of which bugs get found (real or seeded), but it's probably better than having no idea at all how many bugs you might be releasing.
Measuring the amount of testing time put into the product between "showstopper" or major functionality bugs can let you know that you're nearly there. At times of rapid flux in the product with new functionality going in, it's common for a testing team to be finding that most of the bugs they're reporting are serious functionality bugs. As those get dealt with, there's often a large quantity of minor, fit and finish type issues aimed at improving the smoothness and clarity of the interaction; in aggregate they make a great difference in product quality but each one is not terribly important. As those get fixed and testing continues, you probably continue to get bug reports as testers push into error cases and unusual usage patterns. At that point it depends on when you see the business value of releasing versus the risk of undetected showstoppers.