ansaurus

Question

Answer 1

A:

If you have to insert thousands of records with SQL Server, take a look at bulk inserts, your select query shouldn't pose much problems. But this might all be overkill, if this is a one time operation, copying 500.000 records shouldn't take a long time.

Carra 2009-08-01 21:06:13

Answer 2

A:

You could stick all of the record ID's into a queue and have all of your threads pull a record from the queue and insert it into the other database until the queue is empty. You'll need to make a thread-safe method to pull ID's from the queue though.

something like:

public void InsertNextRecord(){
  while(true){
  int recordID = this.PopRecordID();
  if(recordID == -1)
     return;//exit thread
   ///do whatever it is that you need to do to select the record and re-insert it.

  }
}
public int PopRecordID(){
  lock(this._queue){
    if(this._queue.Count == 0)
      return -1;
     return this._queue.Dequeue();
  }
}

So create however many threads you want and have them execute the InsertNextRecord() method till they are done.

MagicWishMonkey 2009-08-01 21:15:21

Answer 3

+1 A:

Personally, I would just use the Bulk Copy class. If I needed this to run in the background, I'd do it on a single additional thread rather than adding all the complexity. Multi-threading is hard enough to get right, and unless it's truly necessary I would limit it to a single background thread, rather than trying to manage a bunch of them and worry about concurrency.

David Stratton 2009-08-01 21:20:14

Answer 4

+3 A:

Do you really need an app to do this? The most efficient way will be to just execute a SQL statement on the server which transfers the data between the tables.

SqlBulkCopy should be easily fast enough with a single thread. For best performance consider loading the data with a datareader and decorating it (decorator pattern) with a class which does the transformation required. You then pass the decorated IDataReader to the SqlBulkCopy to get a continuous stream of data between tables that will keep memory overhead low and complete the transfer in a matter of seconds.

Example: An input table A with one column of type float, and an output table B with a single column of type float. We will extract all of the numbers from table A and insert the square root of every non-negative number into table B.

class SqrtingDataDecorator : IDataReader
{
    private readonly IDataReader _decorated;
    private double _input;

    public SqrtingDataDecorator(IDataReader decorated)
    {
         _decorated = decorated;
    }
    public bool Read()
    {
        while (_decorated.Read())
        {
            _input = _decorated.GetDouble(0);
            if (_input >= 0)
                return true;
        }
        return false;
    }
    public object GetValue(int index)
    {
        return Math.Sqrt(_input);
    }
    public int FieldCount { get { return 1; } }
    //other IDataReader members just throw NotSupportedExceptions,
    //return null or do nothing. Omitted for clarity.
}

Here is the bit that does the work

//get the input datareader
IDataReader dr = ///.ExecuteDataReader("select floatCol from A", or whatever
using (SqlTransaction tx = _connection.BeginTransaction())
{
    try
    {
        using (SqlBulkCopy sqlBulkCopy =
            new SqlBulkCopy(_connection, SqlBulkCopyOptions.Default, tx))
            {
                sqlBulkCopy.DestinationTableName = "B";
                SetColumnMappings(sqlBulkCopy.ColumnMappings);
                //above method omitted for clarity, easy to figure out

                //now wrap the input datareader in the decorator
                var sqrter = new SqrtingDataDecorator(dr);
                //the following line does the data transfer.
                sqlBulkCopy.WriteToServer(sqrter);
                tx.Commit();
            }
    }
    catch
    {
        tx.Rollback();
        throw;
    }
}

Matt Howells 2009-08-01 21:25:37

This app will need to make decisions on every record that will be read before inserting. So it will need to loop over every record. I don't think bulk insert will work in my case.

2009-08-01 21:36:43

My suggestion does allow you to examine every row that is read and perform whatever logic you wish to transform it into an output row. It will be a little bit more complicated if you want to insert to multiple tables but the principle is sound.

Matt Howells 2009-08-01 22:24:42

That is a great strategy. However, I am somewhat a n00b when it comes to design patterns. Would it be possible for you to provide an example, especially one with the continuous stream of data? Thanks ..

2009-08-02 03:49:14

Nice use of decorator.

RichardOD 2009-08-02 12:01:29

Awesome Matt ... Thanks ...

2009-08-02 18:25:38

@Matt, you need to implement a little bit more of data reader, see: http://github.com/SamSaffron/So-Slow/blob/21328cb3b7f94776f0f57b450a2adc79fe6e0584/MinimalDataReader.cs for the full list, feel free to add a link

Sam Saffron 2009-08-02 23:07:38

@Sam, amended comment about other members. Writing this code from memory! Cheers.

Matt Howells 2009-08-03 06:38:11

Answer 5

A:

Is there any way you can avoid doing this with round trips between an app and the DB? Can this all be done within DB code, in a stored proc or set of stored procs?

Joe 2009-08-02 01:10:50

2009-08-02 18:20:42

Answer 6

A:

What makes you think multi-threading will make it faster? The bottleneck is probably the disk on your Sql Sever; and multi-threading will make disk throughput lower instead of higher. Sql Sever will have to mix requests from 3 threads to the disk.

If you have to make it multi-threading, you can divide the work by the id of the row. For example, the first thread does rows 1-333, 1000-1333, 2000-2333, and so on.

Andomar 2009-08-02 07:30:29

well, what I meant was that the app can process the records in the database faster since it would be performing the one-record logic over several threads.I like your suggestion about partitioning the rows, but how would the individual threads know what to read? I am thinking that the initial reading of '# of unprocessed records' would have to be captured by the parent thread. Then it would need to let the children thread know only the rows it would need to access. Is that a proper method?

2009-08-02 18:25:01

When starting a thread, you can pass parameters to it's startup method. Use the parameters to specify the work for that specific thread.

Andomar 2009-08-02 19:13:29

ansaurus

tags:

views:

answers:

Efficient data conversion using .Net

related questions