views:

59

answers:

3

When performing many inserts into a database I would usually have code like this:

using (var connection = new SqlConnection(connStr))
{
  connection.Open();
  foreach (var item in items)
  {
     var cmd = new SqlCommand("INSERT ...")
     cmd.ExecuteNonQuery();
  }
}

I now want to shard the database and therefore need to choose the connection string based on the item being inserted. This would make my code run more like this

foreach (var item in items)
{
  connStr = GetConnectionString(item);
  using (var connection = new SqlConnection(connStr))
  {
    connection.Open();      
    var cmd = new SqlCommand("INSERT ...")
    cmd.ExecuteNonQuery();
  }
}

Which basically means it's creating a new connection to the database for each item. Will this work or will recreating connections for each insert cause terrible overhead?

+4  A: 

I assume you are talking about C#/.NET. In that case the connections are pooled by the framework, so the overhead of creating them that way isn't that high.

Edit

As pointed out by @TomTom transactions should also be taken into considerations. If you are doing inserts into different databases on the same server you can use a normal SQL transaction for that. If the databases are on different servers you would need to use an MSDTC transaction for coordinating them across the database servers. Anyways the best way to handle transactions are by wrapping the relevant code in a TransactionScope. This doesn't conflict with opening and closing (in reality reusing from a pool) database connections.

using(new TransactionScope())
{
    // Update code to various databases, opening and closing connections.
}

With SQL2005 or newer the TransactionScope will default to a SQL Transaction first and then automatically escalate it to an MSDTC transaction if needed.

Anders Abel
-1 for not seeing the underlying problem with transactional boundaries and the fact that this REALLY explosed.
TomTom
Yes, I'm using .NET.
Mr. Flibble
+2  A: 

Theoretically it is ok to create as many connections as you wanr. Recreating a connection is fast, unless you force non-connection pooling. Per standard SQL connections are not closed but put into a pool (for two minutes, iirc) for reuse.

That said, if you open a new connection for every insert you run into a serious problems with - your transaction boundaries. More complex updates need to fall under one transaction. While yoyu CAN just wrap that under a System.Transaction namespace... ...it will mean that all the connections will stay open until commit, using up a LOT of them, and it is going to force MSDTC (Distributed Transaction Coordinator) to step in - with all the overhead it has.

As such, reusing connections is a lot more advisable from an architectural point of view You basically are on a dead end with your transactional behavior, not with the connection count.

TomTom
Ok. The sql commands are simple inserts and generally inserts only happen in a single thread, so I guess the transaction boundaries isn't an issue for me.
Mr. Flibble
Actually they are the moment you have more than one insert that should be part of a transaction. Like insert Invoice, followed by ihnsert InvoiceDetail calls. All those belong to one logical transaction. With your approach, and System.Transaction, an invoice with 20 invoice items would keep up AND BLOCK 21 database connections. This is a "getting fired" level of architecture.
TomTom
+1  A: 

In addition to the points made by the other posters the pattern you say is your usual style reminds me of something I just read.

http://research.microsoft.com/apps/pubs/?id=76507

2.3 Identifying Opportunities for Bulk Operations

Consider a loop in the application code, shown below, inside which the application is inserting data into a table:

for (int i=0;i<MAXVALUE;i++) {
// execute SQL statement INSERT INTO T VALUES(...)
}

As written above, the code is inefficient since each time through the loop an INSERT statement is executed. A much more efficient way to achieve the same result is to use the bulk insert APIs of the data access layer, which takes advantage of batching. Note that the ingredients for identifying this problem is having the application context that a particular SQL statement is executed repeatedly inside a loop, and the database context to know that each instance is in fact an INSERT statement on a table T. It is then possible to put these pieces of information together to suggest a mapping to the bulk insert APIs.

In ADO.NET 2.0 I think this means using SqlBulkCopy

Martin Smith
Thanks. Will look into it.
Mr. Flibble