views:

78

answers:

2

As I have stated in this question some time ago, I am having performance problems when accessing an sqlite database from Python. To make that clear once more, the identical code runs more than 20 times faster using apsw. I recently installed a different version of Python in parallel and installed a new version of apsw for that. This version ran slow too. I tried the same code on a different computer using pythons built-int sqlite3, and it ran fast (but slow with apsw). I also tried to install the most recent version of pysqlite on my computer, but that ran slow.

I am absolutely certain that it is not an issue with the schema.

My question now is, how can I proceed to diagnose the error?

+1  A: 

Just in case you may have overlooked this, make sure you are working with the latest versions of both the pysqlite2 data base adapter and the sqlite3 library. The linked answer also shows how to determine exactly which version of each you are using, data which you might want to add to your question.

Ned Deily
I have checked this. It is the latest version. When installing pysqlite you can actually tell it to download and build the latest version, which is what I did.
Space_C0wb0y
It would still be useful if you could give more information about exactly which platforms, which python versions, which pysqlite2 versions, which apsw versions, and which sqlite3 versions you are using where. There are so many moving targets in your question, it is hard to know where to begin.
Ned Deily
A: 

I can offer my experience on a similar experience, but with a different platform, namely J.

There was some slowness, and I pinpointed it to the sqlite3_get_table function. This function returns a pointer for each column, each pointing to an array of pointers, where each of those refers to a null terminated string. Pointers can also be null if the result of a function is null (say a Max on an empty dataset, it will return a null pointer, and not a pointer to a null. I hate that.) J then formed the addresses as readable (form a large matrix of addresses followed with a 0 for offset and -1 for length, meaning up to first null) and cycles through each, to finally re-shape the table in its intended columns and rows.

So, there's a memory transfer aspect, as well as the actual reading aspect, to fetching data from SQLite to another platform. I have found that this often large dataset is not easily handled by J, meaning that it's clunky as all strings. Plus there's that nasty null pointer thing.

I was able to limit matrix modifications enough to optimise the function. The final optimisation was to use the primitive code for reading a memory address (15!:1), and not decently named function (memr), because using memr meant J had to interpret what memr means at every memory read.

In conclusion, if python allows some modification, maybe you can tweak database access to better suit your needs. I hope this helps, but I don't have very high hopes...

MPelletier