Reading the test cases for djapian I found something really interesting: what those guys do is use the setUp method for the TestCase class: they create an object and then use the update method for the indexer, so they effectively have a document to search for and a way to write controlled query tests!
For the curious, the method looks something like this:
def setUp(self):
p = Person.objects.create(name="Alex")
for i in range(self.num_entries):
Entry.objects.create(author=p, title="Entry with number %s" % i, text="foobar " * i)
Entry.indexer.update()
I think this would do, but we have to remember I'm testing a little search engine here, so this solution might be the easy way out; I can't come up with an objection, though, so if you guys have an answer that'll help define a strategy for testing this kind of webApps in python in general, it's more than welcome!
-I think I'll settle for something like this for now (I wanted to test the latency of the queries with a fully populated database also, but I think I could do that later with bench tests in Funkload)
EDIT: Ok, to be faithful to a solution for anyone interested, I ran into another issue: the xapian index (as stated in the comment). To solve it, I created a default test runner that changed the production xapian index for a test index (a smaller one, created with a management script). This runner is fairly simple:
def custom_run_tests(test_labels, verbosity=1, interactive=True, extra_tests=[]):
"""Set the test indices"""
settings.CATEGORY_CLASSIFIER_DATA = settings.TEST_CLASSIFIER_DATA
return run_tests(test_labels, verbosity, interactive, extra_tests)
And, to use it, I simply added a setting:
TEST_RUNNER = 'search.tests.custom_run_tests'
I dropped the aforementioned approach (creating the documents in the setUp) for performance and readability reasons: to test the database I needed a decent amount of documents with some text (a paragraph or two), so I ended up creating a fixture for that (I used a management command that created the documents in the real database, serialized them -writing them to a file- and then deleted 'em).
So, in the end, I didn't read from the live db at all and instead used test fixtures created with a somewhat hacky script and a custom runner, and it wasn't that hard :)