views:

6354

answers:

7

I know that sqlite doesn't perform well with extremely large database files even when they are supported (there used to be a comment on the sqlite website stating that if you need file sizes above 1GB you may want to consider using an enterprise rdbms. Can't find it anymore, might be related to an older version of sqlite).

However, for my purposes I'd like to get an idea of how bad it really is before I consider other solutions.

I'm talking about sqlite data files in the multi-gigabyte range, from 2GB onwards.

Anyone have any experience with this? Any tips/ideas?

A: 

You write in your question:

I know that sqlite doesn't perform well with extremely large database files even when they are supported.

Where does that knowledge come from? Maybe the source of that information would also provide some empirical measured data.

lothar
Hmm, there used to be a comment on the sqlite website stating that if you need file sizes above 1GB you may want to consider using an enterprise rdbms. Can't find it anymore, might be related to an older version of sqlite.
Snazzer
A: 

I think the main complaints about sqlite scaling is:

  1. Single process write.
  2. No mirroring.
  3. No replication.
Unknown
+8  A: 

I've created SQLite databases up to 3.5GB in size with no noticeable performance issues. If I remember correctly, I think SQLite2 might have had some lower limits, but I don't think SQLite3 has any such issues.

According to the SQLite Limits page, the maximum size of each database page is 32K. And the maximum pages in a database is 1024^3. So by my math that comes out to 32 terabytes as the maximum size. I think you'll hit your file system's limits before hitting SQLite's!

Paul Lefebvre
Thanks, that gives me enough confidence to go ahead with some tests. I'll post my findings when I have them.
Snazzer
A: 

I've experienced problems with large sqlite files when using the vacuum command.

I haven't tried the auto_vacuum feature yet. If you expect to be updating and deleting data often then this is worth looking at.

i_like_caffeine
+17  A: 

So I did some tests with sqlite for very large files, and came to some conclusions (at least for my specific application).

The tests involve a single sqlite file with either a single table, or multiple tables. Each table had about 8 columns, almost all integers, and 4 indices.

The idea was to insert enough data until sqlite files were about 50GB.

Single Table

I tried to insert multiple rows into a sqlite file with just one table. When the file was about 7GB (sorry I can't be specific about row counts) insertions were taking far too long. I had estimated that my test to insert all my data would take 24 hours or so, but it did not complete even after 48 hours.

This leads me to conclude that a single, very large sqlite table will have issues with insertions, and probably other operations as well.

I guess this is no surprise, as the table gets larger, inserting and updating all the indices take longer.

Multiple Tables

I then tried splitting the data by time over several tables, one table per day. The data for the original 1 table was split to ~700 tables.

This setup had no problems with the insertion, it did not take longer as time progressed, since a new table was created for every day.

Vacuum Issues

As pointed out by i_like_caffeine, the VACUUM command is a problem the larger the sqlite file is. As more inserts/deletes are done, the fragmentation of the file on disk will get worse, so the goal is to periodically VACUUM to optimize the file and recover file space.

However, as pointed out by documentation, a full copy of the database is made to do a vacuum, taking a very long time to complete. So, the smaller the database, the faster this operation will finish.

Conclusions

For my specific application, I'll probably be splitting out data over several db files, one per day, to get the best of both vacuum performance and insertion/delete speed.

This complicates queries, but for me, it's a worthwhile tradeoff to be able to index this much data. An additional advantage is that I can just delete a whole db file to drop a day's worth of data (a common operation for my application).

I'd probably have to monitor table size per file as well to see when the speed will become a problem.

It's too bad that there doesn't seem to be an incremental vacuum method other than auto vacuum. I can't use it because my goal for vacuum is to defragment the file (file space isn't a big deal), which auto vacuum does not do. In fact, documentation states it may make fragmentation worse, so I have to resort to periodically doing a full vacuum on the file.

Snazzer
Very useful info. Pure speculation but I wonder if the new backup api can be used to create a non fragmented version of your database on a daily basis, and avoid the need to run a VACUUM.
i_like_caffeine
I'm curious, were all your INSERTS in a transaction?
Paul Lefebvre
Yes, inserts were done in batches of 10000 messages per transaction.
Snazzer
+3  A: 

There used to be a statement in the SQLite documentation that the practical size limit of a database file was a few dozen GB:s. That was mostly due to the need for SQLite to "allocate a bitmap of dirty pages" whenever you started a transaction. Thus 256 byte of RAM were required for each MB in the database. Inserting into a 50 GB DB-file would require a hefty (2^8)*(2^10)=2^18=256 MB of RAM.

But as of recent versions of SQLite, this is no longer needed. Read more at: [link text][1]http://www.nabble.com/How-to-minimize-or-remove-need-for-dirty-bitmap-memory-size--td23035748.html

+2  A: 

Much of the reason that it took > 48 hours to do your inserts is because of your indexes. It is incredibly faster to:

1 - Drop all indexes 2 - Do all inserts 3 - Create indexes again

Thats well known...but for a long running process you're not going to periodically drop your indexes to rebuild them, especially when you're going to be querying them to do work. That is the approach being taken though when the sqlite db has to be rebuilt from scratch, the indexes are created after all the inserts are done.
Snazzer