views:

409

answers:

4

Hi

I am wondering is there a way to do batch updating? I am using ms sql server 2005.

I saw away with the sqlDataAdaptor but it seems like you have to first the select statement with it, then fill some dataset and make changes to dataset.

Now I am using linq to sql to do the select so I want to try to keep it that way. However it is too slow to do massive updates. So is there away that I can keep my linq to sql(for the select part) but using something different to do the mass update?

Thanks

Edit

I am interested in this staging table way but I am not sure how to do it and still not clear how it will be faster since I don't understand how the update part works.

So can anyone show me how this would work and how to deal with concurrent connections?

Edit2

This was my latest attempt at trying to do a mass update using xml however it uses to much resources and my shared hosting does not allow it to go through. So I need a different way so thats why I am not looking into a staging table.

using (TestDataContext db = new TestDataContext())
            {
                UserTable[] testRecords = new UserTable[2];
                for (int count = 0; count < 2; count++)
                {
                    UserTable testRecord = new UserTable();

                    if (count == 1)
                    {
                        testRecord.CreateDate = new DateTime(2050, 5, 10);
                        testRecord.AnotherField = true;
                    }
                    else
                    {
                        testRecord.CreateDate = new DateTime(2015, 5, 10);
                        testRecord.AnotherField = false;
                    }


                    testRecords[count] = testRecord;
                }

                StringBuilder sBuilder = new StringBuilder();
                System.IO.StringWriter sWriter = new System.IO.StringWriter(sBuilder);
                XmlSerializer serializer = new XmlSerializer(typeof(UserTable[]));
                serializer.Serialize(sWriter, testRecords);             

                using (SqlConnection con = new SqlConnection(connectionString))
                {
                    string sprocName = "spTEST_UpdateTEST_TEST";

                    using (SqlCommand cmd = new SqlCommand(sprocName, con))
                    {
                        cmd.CommandType = CommandType.StoredProcedure;

                        cmd.CommandType = System.Data.CommandType.StoredProcedure;

                        SqlParameter param1 = new SqlParameter("@UpdatedProdData", SqlDbType.VarChar, int.MaxValue);
                        param1.Value = sBuilder.Remove(0, 41).ToString();
                        cmd.Parameters.Add(param1);
                        con.Open();
                        int result = cmd.ExecuteNonQuery();
                        con.Close();
                    }
                }
            }

@ Fredrik Johansson I am not sure what your saying will work. Like it seems to me you want me to make a update statement for each record. I can't do that since I will have need update 1 to 50,000+ records and I will not know till that point.

Edit 3

So this is my SP now. I think it should be able to do concurrent connections but I wanted to make sure.

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE PROCEDURE [dbo].[sp_MassUpdate]
@BatchNumber uniqueidentifier 
AS
BEGIN
    update Product
    set ProductQty = 50
    from Product prod
    join StagingTbl stage on prod.ProductId = stage.ProductId
    where stage.BatchNumber = @BatchNumber

    DELETE FROM StagingTbl
    WHERE BatchNumber = @BatchNumber

END
A: 

You have to work with the expression trees directly, but it's doable. In fact, it's already been done for you, you just have to download the source:

Batch Updates and Deletes with LINQ to SQL

The alternative is to just use stored procedures or ad-hoc SQL queries using the ExecuteMethodCall and ExecuteCommand methods of the DataContext.

Aaronaught
Can you explain this alternative? I don't care if update part is in a SP or ado.net all I care about is up to that point I rather have the select and manipulation in linq to sql so I can deal with objects. After I done manipulating them then I don't care how they get updated to the database.
chobo2
@chobo2: I'm not sure what there is to explain - if you can write your update logic entirely on the server, as a stored procedure, then just drag that SP into the Linq to SQL designer surface and run it. Won't help you record-by-record though, if you're trying to make different updates to a thousand records; the only way to speed up that process is to use a Table-Valued Parameter, or bulk insert into a staging table, neither of which Linq to SQL supports.
Aaronaught
how would this staging table work?
chobo2
@chobo2: Create a permanent table that has a similar structure to the one you want to update, `BULK INSERT` the updated records using `SqlBulkCopy` or the `bcp` tool, then run a stored procedure that performs a `MERGE` from the staging table to production (or, pre-SQL-2008, just an `UPDATE FROM...INNER JOIN`).
Aaronaught
Do you have an example/tutorial of this update from inner join? So this staging table is a exact copy of the table then? After you done a insert do you clear this staging table? If so what happens if some else add records to this staging table while say your doing the update won't they be lost then? I am using 2005 btw.
chobo2
@chobo2: The staging table doesn't have to be an exact copy, just enough to hold whatever information you plan to update (including the ID column). There are different ways of handling concurrent staged updates, the easiest is to use a "session ID" column for each bulk copy and filter on that ID. And yes, you delete those rows afterward from the staging table, that's generally the last part of the update proc.
Aaronaught
So do you have any examples/tutorials that I can reference that show these situations like concurrent staged updates and this session id column?
chobo2
@chobo2: Sorry, I never saw your last comment... just to clarify, you have no real interest in the batch update/delete extensions, you just want to see how this would be done with a staging table?
Aaronaught
Ya I have really no interest in these extensions. I can use linq all the way to the point of mass inserting then I think the SP for the the udpate would be not that hard(probably alot easier then making these extension methods). I came up with a SP but I am not sure if it can handle concurrent connections.
chobo2
+1  A: 

Use SqlBulkCopy, which is lightning-fast. You'll need a custom IDataReader implementation which enumerates over your linq query results. Look at http://code.msdn.microsoft.com/LinqEntityDataReader for more info and some potentially suitable IDataReader code.

allonym
+1  A: 

You can use the sqlDataAdapter to do a batch update. It dosen’t matter how you fill your dataset. L2SQL or whatever, you can use different methods to do the update. Just define the query to run using the data in your datatable.

The key here is the UpdateBatchSize. The dataadapter will send the updates in batches of whatever size you define. You need to expirement with this value to see what number works best, but typicaly numbers of 500-1000 do best. SQL can then optimize the update and execute a little faster. Note that when doing batchupdates, you cannot update the row source of the datatable.

I use this method to do updates of 10-100K and it usualy runs in under 2 minutes. It will depend on what you are updating though.

Sorry, this is in VB….

Using da As New SqlDataAdapter
      da.UpdateCommand = conn.CreateCommand
      da.UpdateCommand.CommandTimeout = 300

      da.AcceptChangesDuringUpdate = False
      da.ContinueUpdateOnError = False
      da.UpdateBatchSize = 1000 ‘Expirement for best preformance
      da.UpdateCommand.UpdatedRowSource = UpdateRowSource.None 'Needed if UpdateBatchSize > 1
      sql = "UPDATE YourTable"
      sql += " SET YourField = @YourField"
      sql += " WHERE ID = @ID"
      da.UpdateCommand.CommandText = sql
      da.UpdateCommand.UpdatedRowSource = UpdateRowSource.None
      da.UpdateCommand.Parameters.Clear()
      da.UpdateCommand.Parameters.Add("@YourField", SqlDbType.SmallDateTime).SourceColumn = "YourField"
      da.UpdateCommand.Parameters.Add("@ID", SqlDbType.SmallDateTime).SourceColumn = "ID"

      da.Update(ds.Tables("YourTable”)
End Using

Another option is to bulkcopy to a temp table, and then run a query to update the main table from it. This may be faster.

Bremer
+1  A: 

As allonym said, Use SqlBulkCopy, which is very fast(I found speed improvements of over 200x - from 1500 secs to 6s). However you can use the DataTable and DataRows classes to provide data to SQlBulkCopy (which seems easier). Using SqlBulkCopy this way has the added advantage of bein .NET 3.0 compliant as well (Linq was added only in 3.5).
Checkout http://msdn.microsoft.com/en-us/library/ex21zs8x%28v=VS.100%29.aspx for some sample code.

apoorv020