views:

120

answers:

4

SQLite claim to have 679 times more test code than production one. http://www.sqlite.org/testing.html

Does anyone knows how it is possible? Do they generate any test code automatically? What are the major parts of these "45678.3 KSLOC" of test code?

+2  A: 

It's presumably possible if the developers spent 679 times as much time writing test code as they spent writing production code. Just think: if they'd opted instead for 339 times as much test code, they could have had two entire database engines, each still with a ludicrous amount of test coverage.

I once watched a fellow developer trying to placate a furious customer about slipped deadlines by informing them that he had written 5 times as much test code as production code. The customer was not placated, if you can imagine. At least I don't think 5X coverage is extreme anymore.

MusiGenesis
There are probably more people writing test code than production code. For instance the article describes their policy on regression tests - you don't have to understand the SQLite code at all to write a regression test for a bug you just found and reported (although someone has to fit it into the test harness). Even so 45 MLOC sounds like an incredible amount to me: even if it's all in repositories and hence "not auto-generated", a lot of it might have been machine-generated in the first place, if only by emacs macros...
Steve Jessop
I suspect that a lot of the tests (especially in TH3) were contributed by mobile phone makers. They have a bit of a vested interest in *very* thorough testing of the database engine they ship...
Donal Fellows
Also, you remove and replace old code from the product, but unless you've removed a feature there's no point ever removing test cases. So 679:1 LOC doesn't translate to anything like 679:1 time. You can easily spend a day *reducing* the number of LOC in the product and consider it time well spent, but it's rarely worth bothering to do that with test cases.
Steve Jessop
@Steve: you make a good point, if SQLite lets end-users write test code and then submit it for incorporation in the larger body of tests. I agree that a lot of it might have been auto-generated, and a lot of it might also consist of long lists of property assignments and so forth. A great example of how worthless LOC is as a measurement.
MusiGenesis
+1  A: 

"Does anyone knows how it is possible?"

"It is possible" to have 679 times as much test code because a single feature can be used in many different ways. Consider just a single function that takes two parameters. I can generate alot of test code for that one function that tests boundary conditions and many other combinations of conditions. When you consider setup/teardown of the tests, there is additional code there. Depending on their testing framework this overhead may significantly add to the amount of code in testing.

What it really boils down to is the fact the a piece of software can be used in so many different ways, which means that you have many different scenarios to test for. This is the beauty of elegant software, in that a simple program can be applied to numerous scenarios, but that is the same thing that makes verifying and testing software so challenging.

AaronLS
+1  A: 

It uses Tcl to power the test framework so it's much easier to write tests than it is to write the implementation. This encourages thorough testing, which is what you want in a database, yes? Moreover, a fair fraction of those tests are proprietary, aimed at testing in embedded environments; I imagine some corporate user (or users) paid for that sort of thing. It's also quite possible that the same feature is tested multiple times.

Donal Fellows
A: 

Looking at section 3.1 (OOM):

OOM testing is accomplished by simulating OOM errors. SQLite allows an application to substitute an alternative malloc() implementation using the sqlite3_config(SQLITE_CONFIG_MALLOC,...) interface. The TCL and TH3 test harnesses are both capable of inserting a modified version of malloc() that can be rigged to fail after a certain number of allocations. These instrumented mallocs can be set to fail only once and then start working again, or to continue failing after the first failure. OOM tests are done in a loop. On the first iteration of the loop, the instrumented malloc is rigged to fail on the first allocation. Then some SQLite operation is carried out and checks are done to make sure SQLite handled the OOM error correctly. Then the time-to-failure counter on the instrumented malloc is increased by one and the test is repeated. The loop continues until the entire operation runs to completion without ever encountering a simulated OOM failure. Tests like this are run twice, once with the instrumented malloc set to fail only once, and again with the instrumented malloc set to fail continuously after the first failure.

Note that section 7 explicitly states 100% core coverage as determined by gcov. I agree with Donal Fellows that the test framework is largely responsible for the test coverage beyond what a call graph would suggest. Its a much different thing to see malloc() entered nn times and write a test for it than it is to write dozens of tests geared to simulate environments where malloc() is likely to fail.

Yes, the resulting coverage is an artifact of diligence, however so is the selection of a test framework that enables that kind of diligence.

Finally, reiterating the obvious, malloc() takes only a single void pointer. This suggests that the tests written around it are by deliberate design, not automatically generated.

Tim Post