views:

1641

answers:

4

We have a need to update several tables that have parent/child relationships based on an Identity primary-key in the parent table, which is referred to by one or more child tables as a foreign key.

  • Due to the high volume of data, we would like to build these tables in memory, then use SqlBulkCopy from C# to update the database en mass from either the DataSet or the individual DataTables.
  • We would further like to do this in parallel, from multiple threads, processes, and possibly clients.

Our prototype in F# shows a lot of promise, with a 34x performance increase, but this code forces known Identity values in the parent table. When not forced, the Identity column does get correctly generated in the database when SqlBulkCopy inserts the rows, but the Identity values do NOT get updated in the in-memory DataTable. Further, even if they were, it is not clear if the DataSet would correctly fix-up the parent/child relationships, so that the child tables could subsequently be written with correct foreign key values.

Can anyone explain how to have SqlBulkCopy update Identity values, and further how to configure a DataSet so as to retain and update parent/child relationships, if this is not done automatically when a DataAdapter is called to FillSchema on the individual DataTables.

Answers that I'm not looking for:

  • Read the database to find the current highest Identity value, then manually increment it when creating each parent row. Does not work for multiple processes/clients and as I understand it failed transactions may cause some Identity values to be skipped, so this method could screw up the relation.
  • Write the parent rows one-at-a-time and ask for the Identity value back. This defeats at least some of the gains had by using SqlBulkCopy (yes, there are a lot more child rows than parents ones, but there are still a lot of parent rows).

Similar to the following unanswered question:

A: 

I guess the trade off you face is the performance of the BulkInsert vs the reliabilty of the Identity.

Can you put the database into SingleUserMode temporarily to perform your insert?

I faced a very similar issue with my conversion project where I am adding an Identity column to very large tables, and they have children. Fortunately I was able to setup the identity the parent and child sources (i used a TextDataReader) to perform the BulkInsert, and I generated the Parent and child files at the same time.

I also gained the performance gains you are talking about, OleDBDataReader Source -> StreamWriter ... and then TextDataReader -> SQLBulk

Paul Farry
+3  A: 

First of all: SqlBulkCopy is not possible to do what you want. As the name suggests, it's just a "one way street". I moves data into sql server as quick as possible. It's the .Net version of the old bulk copy command which imports raw text files into tables. So there is no way to get the identity values back if you are using SqlBulkCopy.

I have done a lot of bulk data processing and have faced this problem several times. The solution depends on your architecture and data distribution. Here are some ideas:

  • Create one set of target tables for each thread, import in these tables. At the end join these tables. Most of this can implemented in a quite generic way where you generate tables called TABLENAME_THREAD_ID automatically from tables called TABLENAME.

  • Move ID generation completly out of the database. For example, implement a central webservice which generates the IDs. In that case you should not generate one ID per call but rather generate ID ranges. Otherwise network overhead becomes usually a bottle neck.

  • Try to generate IDs out your data. If it's possible, your problem would have been gone. Don't say "it's not possible" to fast. Perhaps you can use string ids which can be cleaned up in a post processing step?

And one more remark: An increase of factor 34 when using BulkCopy sounds to small in opinion. If you want to insert data fast, make sure that your database is configured correctly.

Achim
A: 

I tried a "SET IDENTITY_INSERT " + tableName + " ON" But it does not work with SqlBulkCopy. I will likely have to go back to a row for row copy mechanism.

Bernie
bulk insert doesn't require that, it has its own flag called KEEPIDENTITY to control inserting of identity values
KeeperOfTheSoul
A: 

Hi,

Look here: http://daniel.wertheim.se/2010/10/24/c-batch-identity-inserts/

//Daniel

Daniel