I guess everybody agrees that having continuous builds and continuous integration is beneficial for quality of the software product. Defects are found early so they can be fixed ASAP. For continuous builds, which take several minutes, it is usually easy to find the one who caused the defect. However, for nightly integration tests, which take long time to run, this may be a challenge. Here are specifics of the situation, for which I'm looking for an optimal solution:
- Running integration tests takes more than 1 hour. Therefore they are run overnight. Multiple check-ins happen every day (team of about 15 developers) so it is sometimes difficult to find the "culprit" (if any).
- Integration testing environment depends on other environments (web services and databases), which may fail from time to time. This causes integration tests to fail.
So how to organize the team so that these failures are fixed early? In my opinion, there should be someone appointed to DIAGNOSE the defect(s). This should be the first task in the morning. If he needs an expertise of others, they should be readily available. Once the source (component, database, web service) of the failure is determined, the owner should start fixing it (or another team should be notified).
How to appoint the one who diagnoses the defects? Ideally, someone would volunteer (ha ha). This won't happen very often, I'm afraid. I've heard other option - whoever comes first to the office should check the results of the nightly builds. This is OK, if the whole team agrees. However, this rewards those who come late. I suppose that this role should rotate in the team. The excuse "I don't know much about builds" should not be accepted. Diagnostics of the source of the failure should be rather straightforward. If it is not, then adding more of diagnostics logging to the code should improve the visibility into integration test failures.
Any experience in this area or suggestions for improvements of the above approach?