views:

42

answers:

2

I am developing some Python modules that use a mysql database to insert some data and produce various types of report. I'm doing test driven development and so far I run:

  • some CREATE / UPDATE / DELETE tests against a temporary database that is thrown away at the end of each test case, and
  • some report generation tests doing exclusively read only operations, mainly SELECT, against a copy of the production database, written on the (valid, in this case) assumption that some things in my database aren't going to change.

Some of the SELECT operations are running slow, so that my tests are taking more than 30 seconds, which spoils the flow of test driven development. I can see two choices:

  1. only put a small fraction of my data into the copy of the production database that I use for testing the report generation so that the tests go fast enough for test driven development (less than about 3 seconds suits me best), or I can regard the tests as failures. I'd then need to do separate performance testing.
  2. fill the production database copy with as much data as the main test database, and add timing code that fails a test if it is taking too long.

I'm not sure which approach to take. Any advice?

+1  A: 

I'd do both. Run against the small set first to make sure all the code works, then run against the large dataset for things which need to be tested for time, this would be selects, searches and reports especially. If you are doing inserts or deletes or updates on multiple row sets, I'd test those as well against the large set. It is unlikely that simple single row action queries will take too long, but if they involve a lot alot of joins, I'd test them as well. If the queries won't run on prod within the timeout limits, that's a fail and far, far better to know as soon as possible so you can fix before you bring prod to it's knees.

HLGEM
+1  A: 

The problem with testing against real data is that it contains lots of duplicate values, and not enough edge cases. It is also difficult to know what the expected values ought to be (especially if your live database is very big). Oh, and depending on what the live application does, it can be illegal to use the data for the purposes of testing or development.

Generally the best thing is to write the test data to go with the tests. This is labourious and boring, which is why so many TDD practitioners abhor databases. But if you have a live data set (which you can use for testing) then take a very cut-down sub-set of data for your tests. If you can write valid assertions against a dataset of thirty records, running your tests against a data set of thirty thousand is just a waste of time.

But definitely, once you have got the queries returning the correct results put the queries through some performance tests.

APC