views:

66

answers:

3

Afternoon everyone. I'm not that familiar with SQLite so I haven't messed with all of the settings of the database. I'm rather familiar with SQL Server, Oracle and even some Access and mySQL. Well, currently, I'm taking a file with 110,000+ records and reading the file line by line, parsing the data and running an insert statement to a table. The table depends on the record type which is the first field of line. Well, I'm loading it right now, and it's been running for 12 minutes (as I write this) and has only imported 14,000 records. Doing the math, it means that it will take somewhere between 1 hour and 15 minutes to 1 hour and 30 minutes. Depending on how the rest of my system acts at the time. Because there are different records types, I couldn't do a bulk insert if there was an option for SQLite (not sure if there is). This is running as a background worker. Below is the function that pulls and parses the data, and the function that inserts it into the DB. Keep in mind, this is a C# application in MVC format (was like that when I took control over it and don't have time to restructure it):

MainForm.cs Background Worker Functions

#region Background Worker Functions

    #region private void InitializeBackgroundWorker()
    /*************************************************************************************
    *************************************************************************************/
    private void InitializeBackgroundWorker()
    {
        backgroundWorker.DoWork +=
            new DoWorkEventHandler(backgroundWorker1_DoWork);
        backgroundWorker.RunWorkerCompleted +=
            new RunWorkerCompletedEventHandler(
        backgroundWorker1_RunWorkerCompleted);
        backgroundWorker.ProgressChanged +=
            new ProgressChangedEventHandler(
        backgroundWorker1_ProgressChanged);
    }
    #endregion

/*****************************************************************************************************************************************************************************************************/

    #region private void backgroundWorker1_DoWork(object sender, DoWorkEventArgs e)
    /*************************************************************************************
    *************************************************************************************/
    private void backgroundWorker1_DoWork(object sender, DoWorkEventArgs e)
    {
        // Get the BackgroundWorker that raised this event.
        BackgroundWorker worker = sender as BackgroundWorker;

        // Assign the result of the computation
        // to the Result property of the DoWorkEventArgs
        // object. This is will be available to the 
        // RunWorkerCompleted eventhandler.

        //Creates a static singleton file list.  Remains on the stack and can be accessed anywhere without
        // reinstatiating
        object[] obj = (object[])e.Argument;
        string fileName = obj[0].ToString();
        DataController controller = new DataController(worker, e);
        controller.FileName = fileName;
        try
        {
            if (strProcess == "Import")
            {
                controller.Import();
            }
            else if (strProcess == "Export")
            {
                controller.ExportToExcel();
            }
        }
        catch (Exception ex)
        {
            MessageBox.Show(ex.Message.ToString());
        }
    }
    #endregion

/*****************************************************************************************************************************************************************************************************/

    #region private void backgroundWorker1_RunWorkerCompleted(object sender, RunWorkerCompletedEventArgs e)
    /*************************************************************************************
    *************************************************************************************/
    private void backgroundWorker1_RunWorkerCompleted(object sender, RunWorkerCompletedEventArgs e)
    {
        if (e.Error != null)
        {
            MessageBox.Show(e.Error.StackTrace);
        }
        else
        {
            this.toolStripStatusLabel1.Text = "Import complete";
            generateReport();
            treeViewFigure.Nodes.Clear();
            BuildTree();
            treeViewFigure.TopNode.ExpandAll();
            labelIPBNumber.Text = controller.IPBNumber;
            this.Text += "IPB: " + labelIPBNumber.Text;

            cmbIndentureLevel.Items.Clear();
        }
    }
    #endregion

/*****************************************************************************************************************************************************************************************************/

    #region private void backgroundWorker1_ProgressChanged(object sender, ProgressChangedEventArgs e)
    /*************************************************************************************
    *************************************************************************************/
    private void backgroundWorker1_ProgressChanged(object sender, ProgressChangedEventArgs e)
    {
        string stat = e.UserState.ToString();
        this.toolStripStatusLabel1.Text = "";
        this.toolStripStatusLabel1.Text = stat;
        this.toolStripProgressBar1.Value = e.ProgressPercentage;
    }
    #endregion

#endregion

Importer.cs Import Function

    #region public void Import(string fileName)
    /*************************************************************************************
    *************************************************************************************/
    public void Import(string fileName)
    {
        if (!File.Exists(fileName))
        {
            throw new FileNotFoundException();
        }

        StreamReader read = File.OpenText(fileName);
        List<RecordBase> List = new List<RecordBase>();
        DataFactory factory = DataFactory.BuildFactory();

        int nLines = 0;

        while (!read.EndOfStream)
        {
            read.ReadLine();
            nLines++;
        }

        read.Close();
        read = File.OpenText(fileName);

        factory.lstObservers = _observers;
        factory.ClearDB();

        int count = 1;

        while (!read.EndOfStream)
        {
            string[] fields = read.ReadLine().Split('|');
            List<string> lstStr = new List<string>();
            foreach (string str in fields)
            {
                lstStr.Add(str);
            }

            lstStr.RemoveAt(fields.Length - 1);
            fields = lstStr.ToArray();

            string strValues = string.Join("','", fields);
            strValues = "'" + strValues + "'";
            if (fields.Length >= 39 && fields[0] == "03")
            {
                factory.ImportTaggedRecord(fields[38], count);
                int nIndex = strValues.IndexOf(fields[38]);
                strValues = strValues.Substring(0, nIndex - 2);
            }

            factory.ImportIPB(strValues, fields[0], count);

            progress.ProgressComplete = (count * 100) / nLines;
            progress.Message = "Importing Record: " + count++.ToString();
            Notify();
        }
    }
    #endregion

DataFactory.cs ImportIPB function

    #region public void ImportIPB(string strValues, string strType)
    /*************************************************************************************
    *************************************************************************************/
    public void ImportIPB(string strValues, string strType, int nPosition)
    {
        string strCommand = string.Empty;

        switch (strType)
        {
            case "01":
                strCommand = Queries.strIPBInsert;
                break;
            case "02":
                strCommand = Queries.strFigureInsert;
                break;
            case "03":
                strCommand = Queries.strPartInsert;
                break;
        }

        ExecuteNonQuery(strCommand + strValues + ", " + nPosition.ToString() + ")");
    }
    #endregion

Database.cs ExecuteNonQuery method

    #region public void ExecuteNonQuery(string strSQL)
    /*************************************************************************************
    *************************************************************************************/
    public void ExecuteNonQuery(string strSQL)
    {
        DbCommand dbCommand = _dbConnection.CreateCommand();
        dbCommand.CommandText = strSQL;
        dbCommand.Prepare();
        dbCommand.ExecuteNonQuery();
    }
    #endregion

Can anyone see anything from the provided that could be improved on? Are there settings to the background worker that can be set to work faster? Are there default settings for the background worker? What kind of settings are there on the db file that could be changed (using SQLite Expert Personal) to make inserts faster? Is it just the size of my file? Right now, as I finish this, it just went past 22 minutes and it's done 24,000 records. This isn't a time sensitive matter, so take all the time you need. Thanks.

Update: Also, I think I should mention that on one of the tables I have an interger primary key (acts as an identity field). Could that have any performance issues?

+5  A: 

Use a single SQLiteTransaction around the entire insert. As is, it will force flushing to the file after every insert to preserve ACID-compliance. As with any DbConnection and DbTransaction, you use BeginTransaction, then when you're done, Commit. The entire insert will succeed or fail, and it will have better performance.

Matthew Flaschen
Ok, I'll look into that.
XstreamINsanity
Now I have to figure out the best way to implement it as I have the connection in one file, and the inserts are being created one by one into that other file. Would it be possible for me to make a BeginTransaction function in my Database file, call that before I start reading the file, and then create a Commit function in my Database file and call that after I finish reading the file? Or would that not be ideal?
XstreamINsanity
I think that's fine. `BeginTransaction` returns a `DbTransaction`, so you can just store that in a `Database` instance field, then call `Commit` when you're done.
Matthew Flaschen
Thanks a lot. Aside from there being a syntax error because of one of the records in the file, it almost inserted 87000+ records in about 15 seconds. I'll let you guys know when I get the entire file inserted.
XstreamINsanity
+1  A: 

The number one thing that will increase insert performance is to only begin a single transaction. It will result in an orders-of-magnitude speedup for your inserts.

See here for the FAQ entry that describes this phenomenon.

Mark
+1  A: 

FWIW, the command-line client for SQLite has a data-loading builtin command. But if you read the C code of that SQLite client, you'd see it doesn't do anything special. It just reads your data file line by line and executes INSERT in a loop.

Other answers have suggested using explicit transactions so you can avoid overhead of I/O flushing after each row. I agree with that advice, it will certainly have a huge benefit.

You can also disable the rollback journal:

PRAGMA journal_mode = OFF

Or set writes to asynchronous, allowing the operating system to buffer I/O:

PRAGMA synchronous = OFF

These pragma changes should save significant I/O overhead. But without a rollback journal, the ROLLBACK command won't work, and if your application crashes during an in-progress transaction, your database might be corrupted. Without synchronous writes, an operating system failure could also result in lost data.

Not trying to scare you, but you should know that there is a tradeoff between performance and guaranteed I/O integrity. I recommend operating with safety modes enabled most of the time, and disable them only briefly when you need to do a big data load like you're doing -- then remember to re-enable the safety modes!

Bill Karwin
In this case, it looks like he can get satisfactory performance without using options that sacrifice integrity.
Matthew Flaschen
Yes, using standard transaction semantics is the better choice, and that's what I use 99% of the time. In rare cases, extra measures may be needed so it's worth knowing they are available.
Bill Karwin
And I +1 you for the rare case thinking as I always try to avoid them, no matter how rare. Thanks.
XstreamINsanity