ansaurus

Question

Compress XML column in Sqlite with Python is SLOW!

Answer 1

+2 A:

Not sure you can increase the performance by doing an update after the fact. there's too much overhead between doing the compress and updating the record. you won't gain any space savings unless you do a vacuum after you're done with the updates. the best solution would probably be to do the compress when the records are first inserted. then you get the space savings and the performance hit won't likely be as noticeable. if you can't do it on the insert, then i think you've explored the two possibilities and seen the results.

Don Dickinson 2010-07-06 15:46:20

I'm asking this question to learn how to write better DB conversion scripts whenever I change the schema. The perf was terrible with just 6K rows, what will I do when I need to update 60K rows to a new schema? I've already run VACUUM and updated my script to compress on INSERT. It reduced space by 10X.

projectshave 2010-07-06 16:59:06

as mentioned above (by Larry), wrapping the updates inside a transaction will definitely help. you have to experiment with that. perhaps do a begin transaction, add 10000 records, then do a commit. repeat every 10k records. compare that against doing it for every 1000 records, etc. until you find the best performance.

Don Dickinson 2010-07-06 20:43:40

Answer 2

+2 A:

You are comparing apples to oranges here. The big difference between the sqlite3|gzip and python version is that the later writes the changes back to the DB!

What sqlite3|gzip does is:

read the db
gzip the text

in addition to the above the python version writes the gzipped text back into the db with one UPDATE per read record.

Almir Karic 2010-07-06 17:37:01

Answer 3

+1 A:

Sorry, but are you implicitly starting a transaction in your code? If you're autocommitting after each UPDATE that will slow you down substantially.

Do you have an appropriate index on date and/or location? What kind of variation do you have in those columns? Can you use an autonumbered integer primary key in this table?

Finally, can you profile how much time you're spending in the zlib calls and how much in the UPDATEs? In addition to the database writes that will slow this process down, your database version involves 6000 calls (with 6000 initializations) of the zip algorithm.

Larry Lustig 2010-07-06 17:50:32

Thanks for the info. RE: Tx, I put a commit after the 6K updates, but maybe it's doing an autocommit after each UPDATE. I'll check that. RE: indexes, I don't have any set (I should add one). RE: zlib, I don't see a way to reuse the zlib Compression object. I'll check that.

projectshave 2010-07-06 22:19:03

ansaurus

tags:

views:

answers:

Compress XML column in Sqlite with Python is SLOW!

related questions