views:

1187

answers:

22

I'm trying to write a VB.Net program that saves 1-2 million 5-field records (plus an indexed ID) to an MSAccess table every day. The saving process currently takes 13-20 hours, which obviously can't be right.

Its a flat table with minimal indexing, currently only 156MB. Except for one double field, the fields are small strings, dates, or longs. The disk itself is a 15,000 SATA which is used only for this file. The computer and the program are not doing anything else during the save routine. The save routine is a simple FOR-NEXT loop that issues a short and simple INSERT statement for each record in the dataset.

Anyone got an ideas on what I need to change to get this to work better?

A: 

(Disclaimer: I know nothing much about Access)

What do any profiling tools say? (task manager will give you some clues - add more columns to the display to see I/Os, VM usage etc)? Is it doing a lot of disk accesses, or is it all CPU? Is it consuming huge amounts of memory?

The mention of an index concerns me since potentially that needs updating after every INSERT - can you turn the index off, do the creation, then index the complete file afterwards?

Is it linear in time? (i.e. if you create a file 10% of the size does it take 10% of the time)?

Paul
A: 

Thanks for your questions, Paul.

Page File usage is 600MB, CPU is around 5% most of the time, with spikes to the 80% range every 20 seconds or so. Memory: 2G total, 1.3G available, system cache is 1G.

Yes it appears to be linear, the first 15,000 records take 10 minutes.

AS to the index, I have not tried that, but Access always complains if you don't index at least the ID Field.

The I/O reads seems like a lot, though, almost 6M after 20 minutes of run time and only 25,000 records.

A: 

First, try using one of the many Import options within Access. (Where is the data coming from? Is it delimited, or fexed-length? How do you parse it with VB?)

You should be able to create a table without an index. by declining Access's offer to add one. But work through the Import first.

le dorfier
A: 

Thanks, Doofledorfer.

The data comes in off the internet from a real-time data vendor, in a proprietary "vector" format, which I parse into a dataset. I parse the entire file before I start the save routine. Yes it would be nice if I could "import" the dataset directly into the database, rather than save each record individually. But I do not know of a way to do that.

There's an Import wizart that walks you through it. Or post a couple rows of data.
le dorfier
A: 

You could try writing the data out to a CSV file, and then the data should be easily imported into access with a single command. This may speed up the process significantly.

Kibbee
A: 

How many connections you have open to MSAccess database (mdb)? From what I've seen, single user mode is substantially faster than multi-user mode. Any open MS Access program counts as single connection (you can look inside ldb file with ldbview tool).

Do you use row-level or page-level locking? Starting from some Access version (2000?), it defaults to row-level locking; I'd presume that page-level locking would be faster.

Don't you have some antivirus running? They can intercept file operations and slow down entire process noticeably.

Arvo
A: 

There is only one connection open, and it is a single-user system.

Tried turning off my anti-virus, but it still only saves 800 records a second.

I am not familiar with "locking", but my OLEDB connection string contains "Locking mode=1"

Incidentally, if its relevant, the VM size had grown to 157,000K after just a few minutes.

A: 

Here are a few notes on an append query for a delimited text file, VBScript, I'm afraid, but it may help.

Set cn = CreateObject("ADODB.Connection")
strFile="C:\ltd.mdb"
strCon="Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" _
& strFile & ";" 

cn.Open strCon

strSQL="INSERT INTO tableX ( Name1,Name2 ) " _
& "SELECT Name1,Name2 " _
& "FROM [ltd.txt] IN '' [Text;Database=c:\docs\;HDR=YES;]"

cn.Execute strSQL
Remou
A: 

Try Locking mode=0 - this is page-level. 800 rps (records per second) gives 480000 records per 10 minutes - did you mean 800 rpm?

Arvo
A: 

Mike, the virus scanner disabled improved things - but something still sounds out of whack... Agree with @Remou that a bulk insert would be much better performing if you could do it.

Assuming you can't bulk insert, just did a sample app (C#, sorry - but the VB.NET will be similar) to create a catalog, table and perform some inserts. I skipped the PK constraint on ID for now.

This was giving me around 1,000 rows in 550ms running in a VM with virus scanner on a laptop drive running with powersaving. You should be able to beat this easily with that fast drive. Where's the differences?

One thought is how are you inserting ID? Autogenerated and omitted from INSERT statement? Or you inserted a value and the column is marked PK? The latter would most certainly trigger an index seek (your sizable read IO???) to verify the uniqueness of the data against the table?

using System;
using System.Data;
using System.Data.OleDb;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            String jetConnection = "Provider=Microsoft.Jet.OLEDB.4.0; Data Source=C:\\jetsample.mdb;";

            ADOX.CatalogClass cat = new ADOX.CatalogClass();
            cat.Create(jetConnection);

            using(OleDbConnection conn = new OleDbConnection(jetConnection))
            {
                conn.Open();
                using(OleDbCommand cmd = new OleDbCommand("CREATE TABLE test ([ID] INTEGER, [TestDouble] DOUBLE, [TestText] TEXT, [TestDate] DATE, [TestInt] INTEGER)",conn))
                {
                    cmd.CommandType = CommandType.Text;
                    cmd.ExecuteNonQuery();
                }

                using (OleDbCommand cmd = new OleDbCommand("INSERT INTO [Test] VALUES (@id, @testDouble, @testText, @testDate, @testInt)", conn))
                {
                    OleDbParameter id = cmd.Parameters.Add("@id", OleDbType.Integer);
                    OleDbParameter testDouble = cmd.Parameters.Add("@testDouble", OleDbType.Double);
                    OleDbParameter testText = cmd.Parameters.Add("@testText", OleDbType.VarWChar);
                    OleDbParameter testDate = cmd.Parameters.Add("@testDate", OleDbType.Date);
                    OleDbParameter testInt = cmd.Parameters.Add("@testInt", OleDbType.Integer);

                    DateTime start = DateTime.Now;
                    for (int index = 1; index <= 2000000; index++)
                    {
                        if (index % 1000 == 0)
                        {
                            System.Diagnostics.Debug.WriteLine(((TimeSpan)(DateTime.Now - start)).Milliseconds);
                            start = DateTime.Now;
                        }

                        id.Value = index;
                        testDouble.Value = index;
                        testText.Value = String.Format("{0} DBL", index);
                        testDate.Value = DateTime.Now.AddMilliseconds(index);
                        testInt.Value = index;

                        cmd.ExecuteNonQuery();
                    }
                }
            }

        }
    }
}
stephbu
A: 

ARVO; Turning off the virus scanner and going to locking mode=0 helped... its up to 1300 records per minute now (yes, I meant minutes) but that's still pretty slow.

STEPHBU: my C skills are minimal, but as far as I can see you are doing things pretty much the same as I. Your "JetConnection" string is considerably simpler than mine... here's mine

Me.connSPY.ConnectionString = "Jet OLEDB:Global Partial Bulk Ops=2;" & _ "Jet OLEDB:Registry Path=;Jet OLEDB:" & _ "Database Locking Mode=0;" & _ "Data Source=""E:\SPIRE.mdb"";" & _ "Mode=Share Deny None;" & _ "Jet OLEDB:Engine Type=5;" & _ "Provider=""Microsoft.Jet.OLEDB.4.0"";" & _ "Jet OLEDB:System database=;" & _ "Jet OLEDB:SFP=False" & _ ";persist security info=False;" & _ "Extended Properties=;" & _ "Jet OLEDB:Compact Without Replica Repair=False;" & _ "Jet OLEDB:Encrypt Database=False;" & _ "Jet OLEDB:Create System Database=False;" & _ "Jet OLEDB:Don't Copy Locale on Compact=False;" & _ "User ID=Admin;" & _ "Jet OLEDB:Global Bulk Transactions=1"

If you do take the bulk import approach - make sure that Global Bulk Transactions is set to 2.http://msdn.microsoft.com/en-us/library/aa140022(office.10).aspx
stephbu
A: 

ops, missed one of your questions STEPHBU... I am allowing the indexed ID to be auto-incremented, rather than trying to assign it in the INSERT statement. Good thought, though!

I modified my table declaration to add a PK constraint on the table - with PK my insert performance was 10% slower on an empty table and degraded 30% by the time I reached 200K rows. Skip the keys/constraints if you can.I'd still go with Remou's solution if you can.
stephbu
+1  A: 

You should really manage a bulk insert. Each insert has a bunch of overhead and by doing one row at a time in a For next loop you are wasting more than 2/3rds of the computer power. If the data comes in one line at a time, you will need to build a buffer to collect it before inserting bulk into the database. Kibbee suggested writing the data to a csv file and then dumping that to the database, and if yo need to write down the data, this is a good method. I would reccomend collecting the data in memory for a few minutes at a time.

Karl
A: 

Yes. Karl I will be trying that later on today... and thank you KIBBEE for the original suggestion, I had not ignored it, just did not have time to try it right away. I will be trying that later today. Got to take the little lady to lunch right now.

Thank you all for your input.

A: 

Can you post a few sample rows of input?

le dorfier
+2  A: 

A trick that can work on any DBMS to substantially speed up an insertion is to disable temporarily the indexes, foreign keys and constraints prior to bulk inserting the data - then enable them again after your data in the database.

Especially indexes can be performance-killers for sequential insertion, it's faster by at least an order (sometimes 2!) of magnitude to fill a table first and then create the index on the already filled data than to insert with the index in place. In this case you might need to drop the index, then recreate it.

Then, as most other posters have already said, it's really a waste of time to insert stuff a row at a time if you can do it in bunches. You'll get a minor speed improvement if you open the table with no locking at all or only optimistic locking.

And then you might get another tiny increment by using DAO recordsets instead of ADO - I noticed this back in the days when I developed in VB6, probably this is not the case anymore with ADO.NET

Joe Pineda
+1  A: 

Okay, back from a long lunch.

PAUL, PINEDA, your suggestions that indexing the PK was the problem were right. Got rid of the index and suddenly it stores 40,000 records per minute, fast enough to do an entire day's worth in under an hour. AND it does not affect the speed of the applications that use the data at all.

The rest of you generous folks... I will plunk away at your suggestions for the rest of the day, and hopefully get it even better.

You have been extremely helpful. I owe you all a beer.

A: 

Doofledorfer: here's 5 rows of sample input, as you asked, though I do think I am on the right track now with the block insert suggestion and the Primary Key non-indexed

INSERT INTO Ticks (Symbol, TickDate, TickTime, TickPRice, TickVolume) VALUES ('SPY', #11/28/2008#, #09:30:00#, 88.63, 200); INSERT INTO Ticks (Symbol, TickDate, TickTime, TickPRice, TickVolume) VALUES ('SPY', #11/28/2008#, #09:30:00#, 88.62, 400); INSERT INTO Ticks (Symbol, TickDate, TickTime, TickPRice, TickVolume) VALUES ('SPY', #11/28/2008#, #09:30:00#, 88.62, 100); INSERT INTO Ticks (Symbol, TickDate, TickTime, TickPRice, TickVolume) VALUES ('SPY', #11/28/2008#, #09:30:00#, 88.62, 300); INSERT INTO Ticks (Symbol, TickDate, TickTime, TickPRice, TickVolume) VALUES ('SPY', #11/28/2008#, #09:30:00#, 88.62, 127);

A: 

That beer that I owed all you guys.... I just had it and thought kindly of you while I enjoyed it.

I always pay my debts!

A: 

1-2 million 5-field records (plus an indexed ID) to an MSAccess table every day.

At the risk of stating what should be obvious. You're solving the wrong problem.

Dump MS-Access and push them onto a MS-SQL server instead. If you REALLY have to access the data later on through MS-Access, then just create a table link to the MS-SQL server's table.

Even at 5 bytes per record you're going to exceed MS Access 2003's 2GB Table/DB size limit in less than a year (unfortunately, its the same story for MS Access 2007).

CodeSlave
Really? Really? -1's? You think I'm wrong about it being a bad idea to dump 2 million records a day into MS Access. Even at 5 bytes per record you're going to exceed Ms Access 2003's 2GB limit in less than a year (2007, same story).
CodeSlave
Someone had a downvote splurge. I got downvoted despite Mike Moyle saying I (and Pineda) had identified the issue correctly...
Paul
There... I nullified three of them.
CodeSlave
upvoted, though please include your comments about the 2GB size limit and the growth rate in your main answer - the way it's written sounds like a troll rant against Access/pro SQL Server. Probably that's why it was downvoted.
Joe Pineda
On a further comment, he's indeed solving the underlying problem: having indexes in place when doing a bulk insertion. That would've hit him as well on SQL S.
Joe Pineda
Just a clarification: moving to SQL Server Express is the only free option but it won't really solve the database size limit issue as SQL Server Express is limited to 4GB.
Renaud Bompuis
+1  A: 

Did you have "auto-commit" enabled?

This would really slow you down as each insert would need to be phsyically written to disk before the next one could be processed.

Try manually commiting about every 1000 inserts.

James Anderson
Forgot about this Access feature! Yes, disabling auto-commit will give him yet another speed boost!
Joe Pineda
A: 

CodeSlave.... That's a valid point, and if this all works out then I will have to spend the money for SQL Server or something similar, not to mention a couple more computers. For now, I do not want to invest the cash or the learning curve.

Mr Anderson... Haven't tried that yet, and I will. But for now, the other suggestions have my 10-20 hour save time down to 15 minutes, so I am a pretty happy camper.

You don't have to spend money at all. You can always use MSDE o SQL Express, the free (albeit limited) versions of SQL Server 2000 and 2005, respectively.
Joe Pineda