views:

1335

answers:

3

I have an Sqlite3 database with a table and a primary key consisting of two integers, and I'm trying to insert lots of data into it (ie. around 1GB or so)

The issue I'm having is that creating primary key also implicitly creates an index, which in my case bogs down inserts to a crawl after a few commits (and that would be because the database file is on NFS.. sigh).

So, I'd like to somehow temporary disable that index. My best plan so far involved dropping the primary key's automatic index, however it seems that SQLite doesn't like it and throws an error if I attempt to do it.

My second best plan would involve the application making transparent copies of the database on the network drive, making modifications and then merging it back. Note that as opposed to most SQlite/NFS questions, I don't need access concurrency.

What would be a correct way to do something like that?

UPDATE:

I forgot to specify the flags I'm already using:

PRAGMA synchronous = OFF
PRAGMA journal_mode = OFF
PRAGMA locking_mode = EXCLUSIVE
PRAGMA temp_store = MEMORY

UPDATE 2: I'm in fact inserting items in batches, however every next batch is slower to commit than previous one (I'm assuming this has to do with the size of index). I tried doing batches of between 10k and 50k tuples, each one being two integers and a float.

+3  A: 

Are you doing the INSERT of each new as an individual Transaction?

If you use BEGIN TRANSACTION and INSERT rows in batches then I think the index will only get rebuilt at the end of each Transaction.

Dave Webb
It will. I was just about to suggest that too :)
Jason Coco
Yes but can I squeeze an entire gigabyte into one transaction? I almost did that by accident (forgot to put commit statement anywhere at all) and I got some disk I/O errors half way through, although I'm not sure if its related...
Dimitri Tcaciuc
+1  A: 

See faster-bulk-inserts-in-sqlite3.

gimel
+4  A: 
  1. You can't remove embedded index since it's the only address of row.
  2. Merge your 2 integer keys in single long key = (key1<<32) + key2; and make this as a INTEGER PRIMARY KEY in youd schema (in that case you will have only 1 index)
  3. Set page size for new DB at least 4096
  4. Remove ANY additional index except primary
  5. Fill in data in the SORTED order so that primary key is growing.
  6. Reuse commands, don't create each time them from string
  7. Set page cache size to as much memory as you have left (remember that cache size is in number of pages, but not number of bytes)
  8. Commit every 50000 items.
  9. If you have additional indexes - create them only AFTER ALL data is in table

If you'll be able to merge key (I think you're using 32bit, while sqlite using 64bit, so it's possible) and fill data in sorted order I bet you will fill in your first Gb with the same performance as second and both will be fast enough.

Mash
Keeping the amount of data per one INSERT statement in check with the cache_size parameter seems to do the trick. Obviously more cache there is, more items can be inserted in one go. Also it looks like I can just make one commit at the end of everything after all.
Dimitri Tcaciuc
Well, you can. But the main trick for any insert operation being O(1) is to fill in data sorted by that index, but in case your data fits into memory cache everything is really fast. It's reasonable to keep commit size less than cache size, otherwise sqlite will be forced to move it on disk.
Mash