views:

196

answers:

3

I'm inserting large number of records using LinqToSql from C# to SqlServer 2008 express DB. It looks like the insertion is very slow in this. Following is the code snippet.

public void InsertData(int id)
{

  MyDataContext dc = new MyDataContext();

  List<Item> result = GetItems(id);

  foreach (var item in result)
  {
    DbItem dbItem = new DbItem(){ItemNo = item.No, ItemName=item.Name};
    dc.Items.InsertOnSubmit();
  }

  dc.SubmitChanges();
}

Am I doing anything wrong? Or using Linq to insert large number of records is a bad choice?

Update: Thanks for all the answers. @p.campbell: Sorry for the records count, it was a typo, actually it is around 100000. Records also range till 200k as well.

As per all the suggestions I moved this operation into parts (also a requirement change and design decision) and retrieving data in small chunks and inserting them into database as and when it comes. I've put this InsertData() method in thread operation and now using SmartThreadPool for creating a pool of 25 threads to do the same operation. In this scenario I'm inserting at a time only 100 records. Now, when I tried this with Linq or sql query it didn't make any difference in terms of time taken.

As per my requirement this operation is scheduled to run every hour and fetches records for around 4k-6k users. So, now I'm pooling every user data (retrieving and inserting into DB) operation as one task and assigned to one thread. Now this entire process takes around 45 minutes for around 250k records.

Is there any better way to do this kind of task? Or can anyone suggest me how can I improve this process?

+3  A: 

if you are inserting large record of data you can try with BULK INSERT .

As per my knowledge there is no equivalent of bulk insert in Linq to SQL.

anishmarokey
+2  A: 

You've got the SubmitChanges() being called once, which is good. This means that only one connection and transaction are being used.

Consider refactoring your code to use InsertAllOnSubmit() instead.

List<dbItem> newItems = GetItems(id).Select(x=> new DbItem{ItemNo = x.No,
                                                           ItemName=x.Name})
                                    .ToList();
db.InsertAllOnSubmit(newItems);
dc.SubmitChanges();

The INSERT statements are sent one-by-one as previous, but perhaps this might be more readable?

Some other things to ask/consider:

  • What's the state of the indexes on the target table? Too many will slow down the writes. * Is the database in Simple or Full recovery model?
  • Capture the SQL statements going across the wire. Replay those statements in an adhoc query against your SQL Server database. I realize you're using SQL Express, and likely don't have SQL Profiler. Use context.Log = Console.Out; to output your LINQ To SQL statements to the console. Prefer SQL Profiler for convenience though.
  • Do the captured SQL statements perform the same as your client code? If so, then the perf problem is at the database side.
p.campbell
How does this work internally?
ck
Thanks for the input. Refactoring is done. Not much improvement. No indexes other than id field which is primary key and auto generated.
JPReddy
@JPReddy: good stuff. Would be interested to see the perf of the captured + adhoc'd SQL statements.
p.campbell
@p.campbell: Using `InsertAllOnSubmit` will not make the code any faster, because it is calling `InsertOnSubmit` internally for each element in the collection. Still +1 for the list other things to consider.
Steven
@Steven: thanks for the confirmation. I thought that was the case and perhaps I wasn't terribly clear when I mentioned 'one-by-one as previous', but it's worth clarifying. Thanks again.
p.campbell
@p.campbell BTW `InsertAllOnSubmit` takes an `IEnumerable<T>` so the `ToList()` can be dropped. Depending on the number of records returned that may save some time but it's only a secondary issue to the slowness of `InsertAllOnSubmit`.
Ahmad Mageed
+3  A: 

For inserting massive amount of data into SQL in a oner

Linq or SqlCommand, neither are designed for bulk copying data into SQL.

You can use the SqlBulkCopy class which provides managed access to the bcp utility for bulk loading data into Sql from pretty much any data source.

The SqlBulkCopy class can be used to write data only to SQL Server tables. However, the data source is not limited to SQL Server; any data source can be used, as long as the data can be loaded to a DataTable instance or read with a IDataReader instance.

Performance comparison

SqlBulkCopy is by far the fastest, even when loading data from a simple CSV file.

Linq will just generate a load of Insert statements in SQL and send them to your SQL Server. This is no different than you using Ad-hoc queries with SqlCommand. Performance of SqlCommand vs. Linq is virtually identical.

The Proof

(SQL Express 2008, .Net 4.0)

SqlBulkCopy

Using SqlBulkCopy to load 100000 rows from a CSV file (including loading the data)

using (SqlConnection conn = new SqlConnection("Integrated Security=SSPI;Persist Security Info=False;Initial Catalog=EffectCatalogue;Data Source=.\\SQLEXPRESS;"))
{
    conn.Open();
    Stopwatch watch = Stopwatch.StartNew();

    string csvConnString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\\data\\;Extended Properties='text;'";
    OleDbDataAdapter oleda = new OleDbDataAdapter("SELECT * FROM [test.csv]", csvConnString);
    DataTable dt = new DataTable();
    oleda.Fill(dt);

    using (SqlBulkCopy copy = new SqlBulkCopy(conn))
    {
        copy.ColumnMappings.Add(0, 1);
        copy.ColumnMappings.Add(1, 2);
        copy.DestinationTableName = "dbo.Users";
        copy.WriteToServer(dt);
    }
    Console.WriteLine("SqlBulkCopy: {0}", watch.Elapsed);
}

SqlCommand

using (SqlConnection conn = new SqlConnection("Integrated Security=SSPI;Persist Security Info=False;Initial Catalog=TestDb;Data Source=.\\SQLEXPRESS;"))
{
    conn.Open();
    Stopwatch watch = Stopwatch.StartNew();
    SqlCommand comm = new SqlCommand("INSERT INTO Users (UserName, [Password]) VALUES ('Simon', 'Password')", conn);
    for (int i = 0; i < 100000; i++)
    {
        comm.ExecuteNonQuery();
    }
    Console.WriteLine("SqlCommand: {0}", watch.Elapsed);
}

LinqToSql

using (SqlConnection conn = new SqlConnection("Integrated Security=SSPI;Persist Security Info=False;Initial Catalog=TestDb;Data Source=.\\SQLEXPRESS;"))
{
    conn.Open();
    Stopwatch watch = Stopwatch.StartNew();
    EffectCatalogueDataContext db = new EffectCatalogueDataContext(conn);
    for (int i = 0; i < 100000; i++)
    {
        User u = new User();
        u.UserName = "Simon";
        u.Password = "Password";
        db.Users.InsertOnSubmit(u);
    }
    db.SubmitChanges();
    Console.WriteLine("Linq: {0}", watch.Elapsed);
}

Results

SqlBulkCopy: 00:00:02.90704339
SqlCommand: 00:00:50.4230604
Linq: 00:00:48.7702995
badbod99
-1.... your doubt directly contradicts validated knowledge.
TomTom
Tom... what "validated knowledge" exactly? Please provide reference for poor Insert performance on Linq
badbod99
@badbod99: That performance is really superb. Thanks for that suggestion, i'm using that now.
JPReddy