views:

765

answers:

9

I have a mobile application. My client has a large data set ~100.000 records. It's updated frequently. When we sync we need to copy from one database to another.

I have attached the second database to the main, and run an insert into table select * from sync.table.

This is extremely slow, it takes about 10 minutes I think. I noticed that the journal file gets increased step by step.

How can I speed this up?

EDITED 1

I have indexes off, and I have journal off. Using

insert into table select * from sync.table

it still takes 10 minutes.

EDITED 2

If I run a query like

select id,invitem,invid,cost from inventory where itemtype = 1 
order by invitem limit 50 

it takes 15-20 seconds.

The table schema is:

CREATE TABLE inventory  
('id' INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
 'serverid' INTEGER NOT NULL DEFAULT 0,
 'itemtype' INTEGER NOT NULL DEFAULT 0,
 'invitem' VARCHAR,
 'instock' FLOAT  NOT NULL DEFAULT 0,
 'cost' FLOAT NOT NULL DEFAULT 0,
 'invid' VARCHAR,
 'categoryid' INTEGER  DEFAULT 0,
 'pdacategoryid' INTEGER DEFAULT 0,
 'notes' VARCHAR,
 'threshold' INTEGER  NOT NULL DEFAULT 0,
 'ordered' INTEGER  NOT NULL DEFAULT 0,
 'supplier' VARCHAR,
 'markup' FLOAT NOT NULL DEFAULT 0,
 'taxfree' INTEGER NOT NULL DEFAULT 0,
 'dirty' INTEGER NOT NULL DEFAULT 1,
 'username' VARCHAR,
 'version' INTEGER NOT NULL DEFAULT 15
)

Indexes are created like

CREATE INDEX idx_inventory_categoryid ON inventory (pdacategoryid);
CREATE INDEX idx_inventory_invitem ON inventory (invitem);
CREATE INDEX idx_inventory_itemtype ON inventory (itemtype);

I am wondering, the insert into ... select * from isn't the fastest built-in way to do massive data copy?

EDITED 3

SQLite is serverless, so please stop voting a particular answer, because that is not the answer I'm sure.

A: 

Send only deltas. I.e. Send only diffs. I.e. Send only what's changed.

Viktor Klang
each row will have at least 1 modified column, I don't know which one. It would be a pain to detect that, and run an update on those rows. Think of my question like a refresh operation.
Pentium10
+10  A: 

If the target is some version of MS SQL Server, SqlBulkCopy offers an efficient insert for large data sets this is similar to the command bcp.

You can also disable/remove the non-clustered indexes before inserting, and re-create them after.

In SQLite, these are usually pretty fast:

.dump ?TABLE? ...      Dump the database in an SQL text format
.import FILE TABLE     Import data from FILE into TABLE

Also try: PRAGMA journal_mode = OFF

FYI, you should be able to run the command line utility on Windows Mobile if you include it in your package.

jspcal
As stated in tags, it's sqlite. I am already using bulk copy of SQLite, by attaching two databases and using a INSERT INTO .. SELECT * FROM method to copy the data. My indexes, triggers and several pragmas are already turned off.
Pentium10
for sqlite try the `.dump/.import` commands rather than select into (your tag didn't specify both dbs were sqlite)
jspcal
I'm not sure if on mobile I can run `.dump/.import` commands. It's server less. (as for not mentioning if both dbs are sqlite, I missed it. But you could deduct it knowing that you cannot ATTACH different type of db in SQL)
Pentium10
@Pentium10: you can, linked servers with odbc
jspcal
good to know, do you have experience with linked servers?
Pentium10
yeah its pretty useful if you need it
jspcal
Do you have the Transaction-Commit facility activated?
Tristan
@jspcal can you guide me what cmd line utility do I have to include it? and how to use it?
Pentium10
@Tristan Yes I have....., but tried as OFF, and still slow. probably it's slow writing to disk, the db file on mobile is 12 megabytes, that's some time to write it.
Pentium10
@Pentium10: shell.c has the command line exe, from the tarball. compiles fine on CF. some compile tips here: http://www.sqlite.org/howtocompile.html
jspcal
`compiles fine on CF` what do you mean? I have to compile on Windows Mobile?
Pentium10
no you just have to link against the right framework, like any other executable
jspcal
aren't out there already compiled version for CF 3.5?
Pentium10
A: 

INSERT INTO SELECT * from attached databases is the fastest available option in SQLite. A few things to look out into.

  1. Transactions. Make sure the entire thing is inside a transaction. This is really critical. If it's only one SQL statement then it's not important, but you said the journal increases "step by step" which indicates it's more than one statement.

  2. Triggers. Do you have triggers running? Those obviously could affect performance.

  3. Constraints. Do you have unnecessary constraints? You can't disable them or remove/re-add them, so if they're necessary there isn't much you can do about them, but it's something to consider.

You already mentioned turning off indexes.

Sam
also mentioned I have journal mode off, that means transactions are off, it takes to much time to write the transaction file to disk, it doesn't helped...turned off.I do not have constraints neither triggers for insert.
Pentium10
A: 

What about storing the sync.table database table within a separate file? That way you just need to make a copy of that file in order to sync. I bet that's way faster than syncing by SQL.

Christian Schwarz
I don't understand your idea. There are two files already separated maindb.sqlite and sync.sqlite. In SQLite there are no separate files for each table. A file is a database.
Pentium10
I suppose there are a bunch of tables in maindb.sqlite and you just want to sync one of those tables in one direction (= backup?). If that's the case, you could introduce a new database file (maindb-syncable.sqlite for instance) that only contains the table you need to synchronize. Then just create a copy of that file whenever you need to sync.
Christian Schwarz
This is a full sync process. There is a main central database and the product has Desktop version too. There is a sync process when the data from the central database is synced to mobile device. The problem appeared after a client come on board with a large inventory. So not only the inventory table is large, but sooner or later the invoices, expenses, parts tables can be large ones too. I want to insert or replace **all records** from sync(new result from server) --> to main(mobile local). The sync.sqlite db is downloaded by the sync process.
Pentium10
A: 

http://notes.theorbis.net/2009/12/fast-batch-insert-to-sqlite-database.html

http://web.utk.edu/~jplyon/sqlite/SQLite_optimization_FAQ.html

drorhan
How can I run the SQLite console on a mobile device(compact framework)?
Pentium10
can you copy the sqlite server to mobile device and run the shell ?http://sqlite.phxsoftware.com/forums/p/214/798.aspx#798http://www.infinitezest.com/articles/using-sqlite-from-shell-in-android.aspx
drorhan
I am on Compact Framework, that's Windows Mobile 6. None of your link helps.
Pentium10
I need to do this from an application
Pentium10
+1  A: 

Do all 100 000 records change very often? Or is it a subset that changes?

If so, you should consider adding an updated_since_last_sync column which gets flagged when an update is made, so during the next sync you only copy the records that have actually changed. Once the records are copied over, you set the flag column back to zero.

tomlog
I already have this built in. The 100.000 records are brand new.
Pentium10
+3  A: 

I'm mobile right now so I can't post a very detailed answer, but this might be worth reading:

http://sqlite.org/cvstrac/wiki?p=SpeedComparison

As you can see SQLite 3 performs INSERTs way faster when using indexes and / or transactions. Also, INSERTs FROM SELECTs doesn't seem to be the strong of SQLite.

Alix Axel
So you say doing a select, then running through that to create prepared insert statements and executing insert statement summed all this is faster than INSERTs FROM SELECTs ?
Pentium10
A: 

If you haven't already you need to wrap it in a transaction. Makes a significant speed difference.

Jay
the insert into table1 ... select * from table2 is one query so it's in a transaction
Pentium10
+3  A: 

I don't think that attaching the two databases and running INSERT INTO foo (SELECT * FROM bar) is the fastest way to do this. If you are synching between a handheld device and a server (or another device) could the transport mechanism be the bottleneck? Or are the two database files already on the same filesysem? If the filesystem on the device is slower flash-memory, could this be a bottleneck?

Are you able to compile/run the raw SQLite C code on your device? (I think that the RAW sqlite3 amalgamation should compile for WinCE/Mobile) If so, and you are willing:

  • To write some C code (using the SQLite C API)
  • Increase risk of data loss by turning off disk journaling

It should be possible for to write a small stand-alone executable to copy/synchronize the 100K records between the two databases extremely quickly.

I've posted some of what I learned about optimizing SQLite inserts here: http://stackoverflow.com/questions/1711631/how-do-i-improve-the-performance-of-sqlite


Edit: Tried this out with real code...

I don't know all the steps involved in building a Windows Mobile executable, but the SQLite3 amalgamation should compile out-of-the box using Visual Studio. Here is a sample main.c program that opens two SQLite databases (both have to have the same schema - see the #define TABLE statement) and executes a SELECT statement and then binds the resulting rows to an INSERT statement:

/*************************************************************
** The author disclaims copyright to this source code.  In place of
** a legal notice, here is a blessing:
**
**    May you do good and not evil.
**    May you find forgiveness for yourself and forgive others.
**    May you share freely, never taking more than you give.
**************************************************************/
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <string.h>
#include "sqlite3.h"

#define SOURCEDB "C:\\source.sqlite"
#define DESTDB "c:\\dest.sqlite"

#define TABLE "CREATE TABLE IF NOT EXISTS TTC (id INTEGER PRIMARY KEY, Route_ID TEXT, Branch_Code TEXT, Version INTEGER, Stop INTEGER, Vehicle_Index INTEGER, Day Integer, Time TEXT)"
#define BUFFER_SIZE 256

int main(int argc, char **argv) {

    sqlite3 * sourceDB;
    sqlite3 * destDB;

    sqlite3_stmt * insertStmt;
    sqlite3_stmt * selectStmt;

    char * insertTail = 0;
    char * selectTail = 0;

    int n = 0;
    int result = 0;
    char * sErrMsg = 0;
    clock_t cStartClock;

    char sInsertSQL [BUFFER_SIZE] = "\0";
    char sSelectSQL [BUFFER_SIZE] = "\0";

    /* Open the Source and Destination databases */
    sqlite3_open(SOURCEDB, &sourceDB);
    sqlite3_open(DESTDB, &destDB);

    /* Risky - but improves performance */
    sqlite3_exec(destDB, "PRAGMA synchronous = OFF", NULL, NULL, &sErrMsg);
    sqlite3_exec(destDB, "PRAGMA journal_mode = MEMORY", NULL, NULL, &sErrMsg);

    cStartClock = clock(); /* Keep track of how long this took*/

    /* Prepared statements are much faster */
    /* Compile the Insert statement */
    sprintf(sInsertSQL, "INSERT INTO TTC VALUES (NULL, @RT, @BR, @VR, @ST, @VI, @DT, @TM)");
    sqlite3_prepare_v2(destDB, sInsertSQL, BUFFER_SIZE, &insertStmt, &insertTail);

    /* Compile the Select statement */
    sprintf(sSelectSQL, "SELECT * FROM TTC LIMIT 100000");
    sqlite3_prepare_v2(sourceDB, sSelectSQL, BUFFER_SIZE, &selectStmt, &selectTail);

    /* Transaction on the destination database */
    sqlite3_exec(destDB, "BEGIN TRANSACTION", NULL, NULL, &sErrMsg);

    /* Execute the Select Statement.  Step through the returned rows and bind
    each value to the prepared insert statement.  Obviously this is much simpler
    if the columns in the select statement are in the same order as the columns
    in the insert statement */
    result = sqlite3_step(selectStmt);
    while (result == SQLITE_ROW)
    {

        sqlite3_bind_text(insertStmt, 1, sqlite3_column_text(selectStmt, 1), -1, SQLITE_TRANSIENT); /* Get Route */
        sqlite3_bind_text(insertStmt, 2, sqlite3_column_text(selectStmt, 2), -1, SQLITE_TRANSIENT); /* Get Branch */
        sqlite3_bind_text(insertStmt, 3, sqlite3_column_text(selectStmt, 3), -1, SQLITE_TRANSIENT); /* Get Version */
        sqlite3_bind_text(insertStmt, 4, sqlite3_column_text(selectStmt, 4), -1, SQLITE_TRANSIENT); /* Get Stop Number */
        sqlite3_bind_text(insertStmt, 5, sqlite3_column_text(selectStmt, 5), -1, SQLITE_TRANSIENT); /* Get Vehicle */
        sqlite3_bind_text(insertStmt, 6, sqlite3_column_text(selectStmt, 6), -1, SQLITE_TRANSIENT); /* Get Date */
        sqlite3_bind_text(insertStmt, 7, sqlite3_column_text(selectStmt, 7), -1, SQLITE_TRANSIENT); /* Get Time */

        sqlite3_step(insertStmt);       /* Execute the SQL Insert Statement (Destination Database)*/
        sqlite3_clear_bindings(insertStmt); /* Clear bindings */
        sqlite3_reset(insertStmt);      /* Reset VDBE */

        n++;

        /* Fetch next from from source database */
        result = sqlite3_step(selectStmt);

    }

    sqlite3_exec(destDB, "END TRANSACTION", NULL, NULL, &sErrMsg);

    printf("Transfered %d records in %4.2f seconds\n", n, (clock() - cStartClock) / (double)CLOCKS_PER_SEC);

    sqlite3_finalize(selectStmt);
    sqlite3_finalize(insertStmt);

    /* Close both databases */
    sqlite3_close(destDB);
    sqlite3_close(sourceDB);

    return 0;
}

On my Windows desktop machine this code copies 100k records from source.sqlite to dest.sqlite in 1.20 seconds. I don't know exactly what kind of performance you'll see on a mobile device with flash memory (but I am curious).

Kassini
The two databases are already on the same filesystem. The filesystem is a slower flash-memory and can be a bottleneck. I don't know how to compile/run the raw SQLite C code on my device. If you point me in the right direction I will give a try.
Pentium10
If you can use Visual Studio to build a Windows Mobile C++ Executable this should be simple. The SQLite3 Amalgamation contains all of the SQLite code in one `.c` file and one `.h` file.
Kassini