tags:

views:

760

answers:

3

Ok, so here's the problem I have to solve. I need to write a method in C# that will modify a table in SQL Server 2008. The table could potentially contain millions of records. The modifications include altering the table by adding a new column and then calculating and setting the value of the new field for every row in the table.

Adding the column is not a problem. It's setting the values efficiently that is the issue. I don't want to read in the whole table into a DataTable and then update and commit for obvious reasons. I'm thinking that I would like to use a cursor to iterate over the rows in the table and update them one by one. I haven't done a whole lot of ADO.NET development, but it is my understanding that only read-only server side (firehose) cursors are supported.

So what is the correct way to go about doing something like this (preferably with some sample code in C#)? Stored procedures or other such modifications to the DB are not allowed.

A: 

Here's a suggestion: You can read data using a DataReader , create a update command for current row and add it to a list of commands.Then run update commands in a transaction. something like this:

var commands=new List<SqlCommand>();
while(dr.Read())
{
var cmd=new SqlCommand();

cmd.CommandText="Add your command text here";
commands.Add(cmd);
}


using(var cnn=new SqlConnection("Connection String"))
{
IDbTransaction transaction;
try
{
cnn.Open();
transaction=cnn.BeginTransaction();
foreach(var cmd in commands)
{
cmd.Transaction=transaction;
cmd.ExecuteNonQuery();
cmd.Dispose(); 
}
transaction.Commit();
}
catch(SqlException)
{
if(transaction!=null)
transaction.Rollback();
throw;
}
}
Beatles1692
An interesting suggestion, but I'm concerned that I may have a list of millions of commands in memory which is just too much overhead.
+1  A: 

Your problem looks like something that you should be solving using T-SQL and not C#, unless there is some business rule that you are picking up dynamically and calculating the column values T-SQL should be the way to go. Just write a stored procedure or just open up Management studio and write the code to make your changes.
If this does not help then please elaborate on what exactly you want to do to the table, then we can help you figure out if this can be done via T-SQL or not.

[EDIT] you can do something like this

    string sql = " USE "  + paramDbName;
    sql+= " ALTER TABLE XYZ ADD COLUMN " + param1 + " datatype etc, then put semicolon to separate the commands as well"
    sql+= " UPDATE  XYZ SET Columnx = " + some logic here
    cmd.CommandText = sql;
    cmd.ExecuteNonQuery();

get this executed on the required instance of Sql Server 2008.
If you have too many lines of text then use StringBuilder.

Binoj Antony
+1  A: 

jpgoody,

Here is an example to chew on using the NerdDinner database and some SQLConnection, SQLCommand, and SQLDataReader objects. It adds one day to each of the Event Dates in the Dinners table.

using System;
using System.Data.SqlClient;

namespace NerdDinner
{
    public class Class1
    {
        public void Execute()
        {
            SqlConnection readerConnection = new SqlConnection(Properties.Settings.Default.ConnectionString);
            readerConnection.Open();

            SqlCommand cmd = new SqlCommand("SELECT DinnerID, EventDate FROM Dinners", readerConnection);
            SqlDataReader reader = cmd.ExecuteReader();

            SqlConnection writerConnection = new SqlConnection(Properties.Settings.Default.ConnectionString);
            writerConnection.Open();

            SqlCommand writerCommand = new SqlCommand("", writerConnection);

            while (reader.Read())
            {
                int DinnerID = reader.GetInt32(0);
                DateTime EventDate = reader.GetDateTime(1);

                writerCommand.CommandText = "UPDATE Dinners SET EventDate = '" + EventDate.AddDays(1).ToString() + "' WHERE DinnerID = " + DinnerID.ToString();
                writerCommand.ExecuteNonQuery();
            }
        }
    }
}
Robert Harvey
I like this idea. This processes the records one at a time and avoids the issue of loading the entire table into memory. How does it compare with my suggestion of using a DataAdaptor and specifying a starting record index and batch size to loop over the entire table in batches of records? Does my suggestion potentially offer performance improvements in that I can process the table in say batches of 100 records at a time, or does it amount to about the same thing? I guess the DataAdapter will update one record at a time, but there may be an improvement in reading in multiple records at a time.
I'm not sure about the batching. It depends on what the DataAdapter is doing under the hood. The SQL Server can execute these SELECT strings extremely fast, so if you are doing this over the network, my guess is that you will be I/O bound.
Robert Harvey
@Robert: Looking at the code example, it can be performed in 1 single UPDATE statement that will do it for all records. This kind of code will be required only when there is some decision making or the data comes from application.
shahkalpesh
@shahkalpesh - Yes, I know. But I assumed JP wanted to perform some processing on each record.
Robert Harvey