I normally do this:
1) Add a new functional test case(s) to the automated regression test system. I normally start a software project with a own regression test system with
- Excel VBA + C library to control SCSI/IDE interface/device (13 years ago), Test report is Excel speadsheet.
- TCL Expect for Complex network router system testing. Test report is webpage. (6 years ago)
- Today I use Python/Expect. Test report is XML + python base XML analyzer.
This goal for all this works is to make sure once any bug is found, it should never show up in the checkin code or production system again. Also it is easier to reproduce the random and long term problems.
Don't check in any code unless it goes thou an over night automate regression test.
I typically write 1:1 ratio between product code vs. testing code. 20k lines of TCL expert for 20K lines of C++ code. (5 years ago.) For example:
- C code would implement a setup tunnel tcp connection forwarding proxy.
- TCL test cases: (a) Setup the connections make sure the data is pass thru. (b) Setup the connections with different network elements. (c) Do that 10, 100, 1000 times and check for memory leak and system resource issues, etc.
- Do this for every features in the system, one can see why the 1:1 ration on test program to code.
I don't want QA team to do automated test with my test system, since all my checkin code has to pass the tests. I usually run 2 weeks long term regression test before I give the code to the QA team.
QA team running manual test cases also make sure my program have enough build-in diagnostic info to capture any future bugs. The goal is have enough diagnostic info to solve 95% of bugs in < 2 hours. I was able to do that in my last project. (Video network equipment at RBG Networks.)
2) Add diagnostic routine (web base nowadays) to get all the internal information. (Current State, Logs, etc). > 50% of my code (c/c++, specially) are diagnostic code.
3) Add more details log for trouble area that I don't understand.
4) Analyze the info.
5) Try fix the bug.
6) Run over night / over the weekend regression tests. When I was in R&D, I typically ask for at lease 5-10 test systems to run continuously regression tests 24x7. That normally helps ID and solve the memory, resource and long term performance problem before the code hit SQA.
Once an embedded system fails boot into Linux prompt from time to time. I added a test case which it power cycle the system with programmable outlet over and over again and make sure it can "see" the command prompt and start running the test overnight. We were able to quick ID the FPGA code problem and make sure the system is always up after 5000 power cycles. A test case was added and everything a new Verilog code checkin / FPGA code is built. This test case was ran. It was never an issue again.