views:

142

answers:

4

Hi,

I'm dealing with chunks of data that are 50k rows each. I'm inserting them into an SQL database using LINQ:

for(int i=0;i<50000;i++)
{
    DB.TableName.InsertOnSubmit
    (
        new TableName
        {
            Value1 = Array[i,0],
            Value2 = Array[i,1]
        }
    );
}
DB.SubmitChanges();

This takes about 6 minutes, and I want it to take much less if possible. Any suggestions?

+1  A: 

You may have to just live with a 6 minute insert. Much of that time is likely coming from reindexing rather than LINQ being inefficient.

Things you can do to improve performance:

  • Use a Bulk Insert. This is a separate file that SQL Server loads and inserts. Obviously, you can't do this in C#, although you might coordinate the file transfer in C#.

  • Disable any additional indexes (other than your PK/Clustered index) on the table that you don't need during the insert, and enable them when you're done. There are caveats; read up on the issues here.

  • Write your own insert script and don't use LINQ. I highly doubt this would actually help you; if the data is static enough that you don't have to dynamically generate the SQL query, you'd be better off with the Bulk Insert option.

Your best plan of attack is at the indexes, but don't expect a great improvement unless you have some wonky indexes on that table. As I said at the beginning of this answer... you may just have to live with a 6-minute insert.

Edit: Per @Aaronaught's excellent suggestion: you could also use SqlBulkCopy. I have not used it, but this article implies it's quite fast.

Randolpho
@Aaronaught: excellent point; I'd completely forgotten about it. Editing soon...
Randolpho
I removed the comment (downvote is not mine, unfortunately). There's also another way which works great for rows in the hundreds/thousands and that's to use Table-Valued Parameters. Not Linq-to-SQL friendly, but still pure .NET.
Aaronaught
+9  A: 

if you are reading in a file you'd be better off using BULK INSERT (Transact-SQL) and if you are writing that much (50K rows) at one time from memory, you might be better off writing to a flat file first and then using Bulk Insert on that file.

KM
I totally agree!
Stefan
Aboslutely, the problem is 50K individual inserts vice one bulk insert, this is a task you simply should not consider using LINQ to do. This sisomething that should be done in a set. BULK insert should whack this out in far less than a minute, I used to bulk insert 21 million records in 16 minutes on an old slow server.
HLGEM
What format does the data file have to be in? Can I just comma separate the values?
Soo
@Soo, YES. also, it can handle just about anything, look at the examples in the link I provide in my answer.
KM
@Soo: As Ian's and now Randolpho's answers point out, when working in .NET, if your data is not *already* in a flat-file format (i.e. if it's just in memory) then it's preferable to use `SqlBulkCopy` as opposed to physically writing out a file and using a raw `BULK INSERT` command. They both have the same ultimate effect, but with `SqlBulkCopy` you're writing the data directly over the TDS and cutting out the middle man, so to speak; the whole process will take only a little more than half the amount of time you'd incur with a write-copy-insert batch.
Aaronaught
+1  A: 

As you are doing a simple insert and not gaining much from the use of LinqToSql, have a look at SqlBulkCopy, it will remove most of the round trips and reduce the overhead on the Sql Server side as well. You will have to make very few coding changes to use it.

Also look at pre-sorting your data by the column that the table is indexed on, as this will lead to better cache hits when SQL-Server is update the table.

Also consider if you should upload the data to a temp staging table that is not indexed, then a stored proc to insert into the main table with a single sql statement. This may let SqlServer spread the indexing work over all your CPUs.

Ian Ringrose
A: 

There are a lot of things you need to check/do.

  1. How much disk space is allocated to the database? Is there enough free to do all of the inserts without it auto increasing in size? If not, increase the database file size as it has to stop every so many inserts to auto resize the db itself.

  2. do NOT do individual inserts. They take way too long. Instead either use table-value parameters (sql 2008), sql bulk copy, or a single insert statement (in that order of preference).

  3. drop any indexes on that table before and recreate them after the load. With that many inserts they are probably going to be fragged to hell anyway.

  4. If you have any triggers, consider dropping them until the load is complete.

  5. Do you have enough RAM available in the database server? You need to check on the server itself to see if it's consuming ALL the available RAM? If so, you might consider doing a reboot prior to the load... sql server has a tendency to just consume and hold on to everything it can get it's hands on.

  6. Along the RAM lines, we like to keep enough RAM in the server to hold the entire database in memory. I'm not sure if this is feasible for you or not.

  7. How is it's disk speed? Is the queue depth pretty long? Other than hardware replacement there's not much to be done here.

Chris Lively