views:

33

answers:

1

We are writing a simple application:

  • build thousands of SQL select statements
  • run each select using BeginExecuteReader
  • put the results into another database

We've tried a few things that either leave connections in a SUSPENDED state (as verified by sp_who2), or take a much longer time to complete than just the SQL query itself (maybe some kind of deadlocking?).

We are:

  • calling EndExecuteReader in the callback handler.
  • calling conn.Close() and conn.Dispose()
  • recursively starting another call

public static void StartQuery() {
  // build the query for array[i]
  // ...
  SqlConnection conn = new SqlConnection(AsyncConnectionString);
  conn.Open();
  cmd.BeginExecuteReader(CallbackHandler, cmd);

  i++;
}



public static void CallbackHandler(IAsyncResult ar) {
     // unpack the cmd
     cmd.EndExecuteReader();

     // read some stuff to a DataTable...

     // SqlBulkCopy to another database (synchronously)

     cmd.Connection.Close();
     cmd.Connection.Dispose();

     StartQuery();
 }

Does anyone have recommendations or links on a robust pattern to solve this type of problem?

Thanks!

+1  A: 

I assume you did set the AsyncronousProcessing on the connection string. Thousands of BeginExecute queries pooled in CLR is a recipe for disaster:

  • you'll be quickly capped by the max worker threads in the SQL Server and start experiencing long connection Open times and frequent time outs.
  • running 1000 loads in parallel is guaranteed to be much slower than running 1000 loads sequentially on N connections, where N is given by the number of cores on the Server. Thousands of parallel requests will simply create excessive contention on shared resources and slow each other down.
  • You have absolutely no reliability with thousands of requests queued up in CLR. If the process crashes, you loose all the work whitout any trace.

A much better approach is to use a queue from which a pool of workers dequeue loads and execute them. A typical producer-consumer. The number of workers (consumers) will be tuned by the SQL Server resources (CPU cores, memory, IO pattern of the loads) but a safe number is 2 times the number of server cores. Each worker uses a dedicated connection for it's work. the role of the workers and the role of the queue is not to speed up the work, but on the contrary, they act as a throttling mechanism to prevent you from swamping the server.

An even better approach is to have the queue persisted in the database, as a means to recover from a crash. See Using Tables as Queues for the proper way of doing it, since table based queuing is notoriously error prone.

And finally, you can just let SQL Server handle everything, the queueing, the throttling and the processing itself via Activation. See Asynchronous Procedure Execution and the follow up article Passing Parameters to a Background Procedure.

Which one is the proper solution depends on lots of factors you know about your problem, but I don't, so I can't recommend which way should you go.

Remus Rusanu