views:

349

answers:

3

My goal is to maximise performance. The basics of the scenario are:

  • I read some data from SQL Server 2005 into a DataTable (1000 records x 10 columns)
  • I do some processing in .NET of the data, all records have at least 1 field changed in the DataTable, but potentially all 10 fields could be changed
  • I also add some new records in to the DataTable
  • I do a SqlDataAdapter.Update(myDataTable.GetChanges()) to persist the updates (an inserts) back to the db using a InsertCommand and UpdateCommand I defined at the start
  • Assume table being updated contains 10s of millions of records

This is fine. However, if a row has changed in the DataTable then ALL columns for that record are updated in the database even if only 1 out of 9 columns has actually changed value. This means unnecessary work, particularly if indexes are involved. I don't believe SQL Server optimises this scenario?

I think, if I was able to only update the columns that had actually changed for any given record, that I should see a noticeable performance improvement (esp. as cumulatively I will be dealing with millions of rows).

I found this article: http://netcode.ru/dotnet/?lang=&katID=30&skatID=253&artID=6635 But don't like the idea of doing multiple UPDATEs within the sproc. Short of creating individual UPDATE statements for each changed DataRow and then firing them in somehow in a batch, I'm looking for other people's experiences/suggestions.

(Please assume I can't use triggers)

Thanks in advance

Edit: Any way to get SqlDataAdapter to send UPDATE statements specific to each changed DataRow (only to update the actual changed columns in that row) rather than giving a general .UpdateCommand that updates all columns?

A: 

you might try is do create an XML of your changed dataset, pass it as a parameter ot a sproc and the do a single update by using sql nodes() function to translate the xml into a tabular form.

you should never try to update a clustered index. if you do it's time to rethink your db schema.

Mladen Prajdic
+1  A: 

Isn't it possible to implement your own IDataAdapter where you implement this functionality ?

Offcourse, the DataAdapter only fires the correct SqlCommand, which is determined by the RowState of each DataRow. So, this means that you would have to generate the SQL command that has to be executed for each situation ...

But, I wonder if it is worth the effort. How much performance will you gain ?
I think that - if it is really necessary - I would disable all my indexes and constraints, do the update using the regular SqlDataAdapter, and afterwards enable the indexes and constraints.

Frederik Gheysels
I hadn't thought about having my own IDataAdapter - I like that suggestion. I do have the same thoughts about will it actually make much difference, but it seems that for bulk updating there is no real way to optimise the route of only updating changed columns. I'm at the point where I think I will try this out when I have time as a trial. I was really hoping there may have been something already built-in to ADO.NET or something.Disabling indexes isn't an option - the db is in constant use and that would cause a lot of pain.
AdaTheDev
A: 

I would VERY much suggest that you do this with a stored procedure. Lets say that you have 10 million records you have to update. And lets say that each record has 100 bytes (for 10 columns this could be too small, but lets be conservative). This amounts to cca 100 MB of data that must be transferred from database (network traffic), stored in memory and than returned to database in form of UPDATE or INSERT that are much more verbose for transfer to database.

I expect that SP would perform much better.

Than again you could divide you work into smaller SP (that are called from main SP) that would update just the necessary fields and that way gain additional performance.

Disabling indexes/constraints is also an option.

EDIT:
Another thing you must consider is potential number of different update statements. In case of 10 fields per row any field could stay the same or change. So if you construct your UPDATE statement to reflect this you could potentially get 10^2 = 1024 different UPDATE statements and any of those must be parsed by SQL Server, execution plan calculated and parsed statement stored in some area. There is a price to do this.

Petar Repac
I can't use SPs - I don't know until runtime what fields I'm actually dealing with, so it's all dynamic. It just seems like the kind of thing that SQL Server should be able to optimise the case for, or ADO.NET.e.g."UPDATE YourTable SET MyField = MyField WHERE ID = 1"Assuming no triggers etc, this doesn't actually change anything but there is a hit involved in doing this
AdaTheDev
Do this C# app works only on one DB or on multiple DBs ? Why something can be done in C# and not in SP ? Must you use SqlDataAdapter (why not just use SqlCommand - you gain some flexibility) ?
Petar Repac
In the case of UPDATE with all fields updated I don't think SQL Server can optimize much. To optimize the server must know the concrete value in a row and by that time it could skip row IO operations, but IMO this would not be visible in execution plan.
Petar Repac
I'm merging data in from an external source so that has to be done in C#. I need to use SqlDataAdapter to batch my updates into the DB - like you say, having n different UPDATE statements, unique to each changed row does have it's downsides too. Maybe it's just me who thinks this scenario should be optimised - I don't think it's generally a problem, it's just when trying to achieve very high performance with bulk dynamic updates, this seems to be an area where there could be room for optimisation, albeit if from a low-level perspective within SQL Server
AdaTheDev