views:

112

answers:

2
+6  A: 

If you have an index on "col", then running your first query will update millions of rows regardless; your second query would potentially only update a few and find those quickly if there's an index available. If you don't have an index on that column, the effect will be marginal since a full table or index scan must occur to check all rows in your table (you'll just have fewer actual updates, but that's it).

The whole point of restricting your queries usnig WHERE clauses is to reduce the scope of your query, e.g. the number of rows SQL Server has to look at. Less data to process is always faster than just doing it to all millions of rows......

In response to your update: the main goal of using a WHERE clause is to reduce the number of rows you need to inspect / touch. If you have a means (typically an index) to reduce that number from 100% to a few percent, then it's definitely worth it. That's the whole point of having indices (mostly for SELECTs, but applies to other operations, too, of course).

If you have a suitable index, and thus you can pluck out a few hundred rows to check against a criteria versus having to inspect millions of rows, you'll always be faster. If you have a good book index in a bookstore that guides you easily to the two shelves where the books that interest you are located, you'll find what you're looking for more quickly than when you have to criss-cross the whole bookstore since there's no index available.

There obviously is a point where yet another criteria or index doesn't help anymore. If that's the case, typically yet another WHERE clause won't really help much - or at all. But in this case, the SQL query optimizer will find those cases and filter them out (possibly even just ignoring them when deciding on what the best query execution plan is).

marc_s
+1 - much better to restrict the update to the minimal number of records to update, than just doing a blanket update on far more rows. The excessive updates would result in more writes.
AdaTheDev
One caveat: Updating a column with an index is presumably slower than updating a column without an index. Therefore if the scope is only reduced a small amount it may not be faster.(In this case it @marc_s and I guessed that only a few actually need to change.)
Kathy Van Stone
@marc_s So are you saying it's always best to add as much to the where clause as possible? Using the last example I gave, if "col" was indexed and "createDate" was not and only 5 records had a col value of "3", wouldn't adding createDate slow it down?
adam0101
The journal becomes very big if you update millions of rows too.
Luc M
@adam: no, it wouldn't - SQL query optimizer would be smart enough to filter out those few rows with col <> 3, and then compare only those few rows against createdDate. Whether that second criteria really makes a big difference would be debatable (or you need to measure it!) - but if you can reduce the scope of your search from millions to a few or few dozen rows, that's almost always worth the effort.
marc_s
@marc_s In response to your response to my update :) I know using an index is faster. My question is under what conditions will using a NON-indexed column actually slow performance - in addition to using the index? Or is this even a possibility?
adam0101
@Adam: never say never, but I'd say in the vast majority (>95%) of the cases, SQL query optimizer will be smart enough to first use the index **IF** that really does reduce your search area significantly, and only in a second step compare the resulting smaller set of rows against any further criteria. Sure - any comparison also incurs a bit of a cost, but so do unnecessary updates.
marc_s
Thanks for all your help!
adam0101
If I recall correctly, SQL Server doesn't actually perform writes on updates that don't change the data. This isn't to say there's no penalty to having extraneous rows included in an update, but that it may not be as big a hit as some fear, especially if the number of extra rows is not significant.
Emtucifor
+2  A: 

This really comes down to index usage and query optimization. I would suggest looking at the query plan before making any decisions.

Adding indexed fields to the where clause will often improve query time, however, adding non-indexed fields can result in table scans which will slow your query.

My suggestion is write a query that works, look at the execution time, work to reduce it to an exceptable level by looking at the query plan. Don't over optimize, go for the acceptable solution.

Chris Kannon
If no indexed columns are available is a table scan inevitable? And if it is, does it matter anymore if additional columns are added to the where clause?
adam0101
If no index is suitable and a table scan is chosen, I don't believe that adding additional columns to the where clause can harm the query much. Even without an index, I suspect that the server while table-scanning uses column statistics to decide the order that it checks columns. A highly selective column is better to start with first so that further consideration is unnecessary if the test fails. Books that start with "the" AND the author = 'Zarzack' is better to check author first.
Emtucifor