views:

58

answers:

4

I need to insert large amount of data into SqlServer 2008. My project is based on linq-to-sql.

I process csv file with 100.000 rows. Each row is mapped to Order object. Order contains also collection of Category and Code objects. I need to map each row to object in order to validate it.

Then I need to insert all these objects into database.

List<Order> orders = Import("test.csv");
db.Orders.InsertAllOnSubmit(orders);
db.SubmitChanges();

OR

foreach(Order order in orders)
db.Orders.InsertOnSubmit(order);
db.SubmitChanges();

Both ways are slow. Is there any workaround? I may use other approach than l2sql for this task.

I read about SqlBulkCopy class - would it handle inserting child entities as well?

A: 

Hey,

This CSV reader was really fast for me: http://www.codeproject.com/KB/database/CsvReader.aspx

But yes, a bulk copy operation utilizing only SQL Server would be faster if you have the option to.

LINQ to SQL doesn't have a bulk update capability that I'm aware of... you have to iterate through.

HTH.

Brian
I use CsvReader to read csv file and map to objects. Main issue is how to insert 100.000 objects into database fast.
jlp
sql tools for bulk inserting is the best, using sqlbulkcopy, or using SSIS to import which has a bulk utility too. Works really fast.
Brian
+1  A: 

As @Brian points out LINQ to SQL does not do bulk insert, but this blog talks about away to get it to work.

The author seems to have added the code since I first read it (it's from 2008).

Mike Two
I tried it and it works fast but it does not insert child entities.
jlp
A: 

I think it is better to insert objects by groups with, for instance, 1000 objects, then dispose session.

Performance here is balanced between two edges: memory overusing caused by keeping all 100,000 objects in memory at one side, and time for creating session and reconnecting database at another side.

By the way, there is no significant difference between session.InsertAllOnSubmit(data) and foreach(var i in data) session.Insert(i).

STO
+1  A: 

Try using smaller transactions.

foreach(List<Order> orderbatch in orders.Batch(100))
{
  db.Orders.InsertOnSubmit(orderbatch); 
  db.SubmitChanges();   
}


public static IEnumerable<List<T>> Batch<T>(this IEnumerable<T> source, int batchAmount)
{
  List<T> result = new List<T>();
  foreach(T t in source)
  {
    result.Add(t);
    if (result.Count == batchSize)
    {
      yield return result;
      result = new List<T>();
    }
  }
  if (result.Any())
  {
    yield return result;
  }
}
David B
If you're going with this kind of solution you might as well try to order by the clustered index. If you don't the insert of each batch might re-order all previous rows. Or it might not, but it's something that you need to watch for.
Mike Two
@Mike Two - Page Splits vs Hot Spots... as always - measure, measure, measure.
David B
I accepted this answer. However EntityFramework does it a few times faster. On the other hand I should use SqlBulkCopy but it does not handle child entities.
jlp