tags:

views:

181

answers:

7

I have created a Data Access Layer using .NET. Everywhere that I update a database record, I have used sql along the lines of

UPDATE Customer SET FirstName=:FirstName, LastName=:LastName, Address1=:Address1, Address2=:Address2,....etc

This means that every field in the record is updated, even though only one field may have been changed. A colleague has taken issue with this, saying that we should only update a field if it has changed, citing bandwidth as the issue - say we had 160 fields, then we pass the data for 160 fields. I guess I could save bandwidth between the web server and the database server if I checked whether a value had changed, and generated sql based purely on the values that were actually changed.

Between the web server and the client, I now need to pass both old and new values, so potentially I increase bandwidth there (but then ASP.NET does this already anyway, and I'm not sure we can switch that bit off so that's probably not an issue).

So what is best practise? Should I be worried about updating all fields in a database record? How would I go about updating only the fields that have changed?

Edit added 29 October: Does anyone know what NHibernate does? Perhaps this is an argument to invest time learning how to do it that way.

+6  A: 

Premature optimization is the root of all evil.

Is there justification for this, do you actually have a bandwidth problem?

In my experience, tracking changes can be lots of work (ie lots of code to maintain) depending on how it is implemented. There are some elegant ways but I think what you are doing is fine unless there is some evidence to support a change.

If you were planning to implement some custom form of change tracking I would certainly not 'chat' with the database about it. Make a copy of your object locally when you first pull it from the DB and compare against that.

keithwarren7
I have mostly used TableAdapters, and I haven't investigated whether they might implement change tracking already, but I think I'm moving towards using my own custom DataTransfer objects, so if there is an elegant way to implement it, I would be very interested. Can you point me in that direction?
Colin
+2  A: 

Assuming you have a gigabit network connection between web server and database servers, then you'd have to be updating somwhere in the region of 100 megabytes of data per second for this to even get close to being a problem. That's around 8 terabytes per day. My guess is that you aren't updating this much data, so it's probably not a problem.

Greg Beech
+1  A: 

Network bandwidth continues to improve at all levels. Unless you legitimately have a bottleneck somewhere, don't worry about it. Even then, my hunch is that the gain is small enough that you'll spend more time and effort (and risk) trying to optimize your update than you will finding performance improvements in other places.

Tenner
+1  A: 

Well, you either do it the way you have done (which works, and passes every field every time), or spend time generating bespoke SQL for every field and then passing an SQL statement for every single field, or generating a huge bit of SQL on the fly containing just the updated fields (risk of SQL Injection if you make a mistake). You'd probably get better bandwidth savings by doing your updates inside parameterised stored procedures and only passing the field values across rather than the names.

Unless bandwidth is really a problem, I don't think you have a problem. And the SP route is a better option anyway.

CodeByMoonlight
What would I put inside my SP? Bespoke SQL for every field? Generated sql? Wouldn't I be better to do that in my code? I could still use parameterised queriesto avoid sql injection.
Colin
In the code, you have to do the full update statement (Field1 = Value1, Field2 = Value2...) and then pass it. Using an SP, you just pass the SP name and the parameter info.
CodeByMoonlight
Aaah! So you are saying that if I have 100 fields and each field name is 10 characters long I could save a bit more than 2000 characters of bandwidth. An advantage of SP's I hadn't really considered before.
Colin
+4  A: 

Perhaps as well as bandwidth, you may also want to consider multi user contention. Depending on how your app is designed, the updating all fields approach may lead to problems.

If you have two users who simultaneously load record number 1. User A changes the LastName from "Smith" to "Smythe" and saves. User B then changes Address1 from "The Street" to "Any Street" and saves.

Under your approach, this second save will also revert the LastName back to "Smith" again (because that was what it was when User B loaded it).

Following this through to its logical conclusion, you will also need to do other checking upon saving (perhaps some sort of timestamp) to ensure that the values in the database are what you think they are.

kevinw
I think this is an either/or situation. You need a timestamp or version number to implement optimistic concurrency when updating all fields. Using generated sql would reduce the risk of 2 users updating the same field considerably, but not eliminate it. To eliminate it you need the old value: update table set field1=newvalue where field1=oldvalue. And then we run back into the bandwidth argument because we are passing old and new values to the database, not just new values........
Colin
What is more important data integrity or bandwidth? To me, the answer to that question is not controversal, although there are exceptions.
Michael Maddox
So you would argue that we should always implement concurrency checking. In which case we are less likely to save much bandwidth using generated sql because we would need to pass old values to the database as well as new values. I think the bandwidth argument is losing ground.....
Colin
+1  A: 

Isn't one of the fields the PK of the table? Are you really updating the PK? What about foreign keys? You're updating those too?

The issue isn't bandwidth. At least isolate the non relational elements of the record and only update that set (excluding the PK and foreign keys).

Ok So what is the issue if a foreign key or a primary key is updated to the same value?I don't think I ever update primary keys, but a foreign key could change. What would you recommend in that case?
Colin
PK change will not work with UPDATE as you will get an error. One would expect that changing foreign key relations are fairly distinct in terms application/domain semantics than merely updating descriptive values. So the issue is obscuring the intent and potentially introducing subtle bugs.
I've just tested updating the primary key. I don't get an error in Oracle, and I'm sure I've done it in sql server too. (Not that I particularly want to do it!) I guess the database would have to rejig an index, which may not be desirable - but no worse than a delete followed by an insert?I'm afraid I don't understand what you mean about the foreign keys. I think most of the foreign keys we are talking about in main tables are effectively just look-up tables. So if a customer moves to another country and I store countries as a lookup, why not just update the foreign key along with Address1?
Colin
@alphazero: What makes you think you can't update PRIMARY KEYs with an UPDATE statement. There are plenty of applications that use meaningful PRIMARY KEYs for at least some tables, how could those be maintained? And why would SQL specify ON CASCADE behavior if PKs could never be updated?
Larry Lustig
+1  A: 

I've found myself in the same situation. I'd written my own lightweight ORM/DAL to interface to a non-optimally designed database, i.e. tables with 200+ fields.

I found that there was indeed an overhead associated with attempting to update all fields for a row, rather than just those that had changed.

In my case I was able to resolve the problem by using intelligent 'setters' for the ORM properties which would record if an attempt was made to modify a particular field. This meant that when I came to commit the object back to the database, I could emit a SQL UPDATE statement which only referred to those fields marked as modified.

N.B. For SELECT statements, best practice dictates that you should explicitly reference the fields you want returned rather than using 'SELECT * FROM'. It is stated that this is for performance/bandwidth reasons, so the inverse would make sense for UPDATES.

madman1969
Our average table contains about 8 columns, our biggest table is 33 columns. Is there a level where you wouldn't worry about it or do you always implement intelligent setters now?
Colin
I always implement intelligent setters, as they are now part of my homebrewed ORM/DAL, so I effectively get them for free.In your particular case, given your average column count I wouldn't suggest you implement intelligent setters as you probably will not see a measurable difference. Unless of course the majority of your updates are likely to be against the 33 column table.
madman1969