views:

259

answers:

5

In other words, is the following "cursoring" approach guaranteed to work:

  1. retrieve rows from DB
  2. save the largest ID from the returned records for later, e.g. in LastMax
  3. later, "SELECT * FROM MyTable WHERE Id > {0}", LastMax

In order for that to work, I have to be sure that every row I didn't get in step 1 has an Id greater than LastMax. Is this guaranteed, or can I run into weird race conditions?

+1  A: 

The only time records might get inserted that you wouldn't get would be if someone turns the identity insert on and manually inserts a record to a skipped id (or in some cases to a negative number). This is a fairly rare occurance and generally would only be done by a system admin. Might be done to reinsert an accidentally deleted record for instance.

HLGEM
+1  A: 

Identities will will always follow the increment that defines the identity:

IDENTITY [(seed ,increment)] http://msdn.microsoft.com/en-us/library/aa933196(SQL.80).aspx

which can be positive or negative (you can have it increment forward or backwards). If you set your identity to increment forward, your identity values will always be larger than the previous, but you may miss some, if you rollback an INSERT.

Yes, if you set your identity increment to a positive value your loop logic will work.

KM
+3  A: 

Guaranteed as in absolutely under no circumstances whatsoever could you possibly get a value that might be less than or equal to the current maximum value? No, there is no such guarantee. That said, the circumstances under which that scenario could happen are limited:

  1. Someone disables identity insert and inserts a value.
  2. Someone reseeds the identity column.
  3. Someone changes the sign of the increment value (i.e. instead of +1 it is changed to -1)

Assuming none of these circumstances, you are safe from race conditions creating a situation where the next value is lower than an existing value. That said, there is no guarantee that the rows will be committed in the order that of their identity values. For example:

  1. Open a transaction, insert into your table with an identity column. Let's say it gets the value 42.
  2. Insert and commit into the same table another value. Let's say it gets value 43.

Until the first transaction is committed, 43 exists but 42 does not. The identity column is simply reserving a value, it is not dictating the order of commits.

Thomas
+1 the second part is exactly my question -- and thus your answer is a "no", because if I do my first retrieval between the times when 43 gets comitted and 42 gets commited, I'll never know about row 42
balpha
@balpha - Correct. The identity column simply reserves values that are auto incremented. It provides no guarantees as to sequence of values or sequence of commits. It is *probably* the case that the values were committed in the order of their identity values but that is very different than saying they are *guaranteed* to be in that order.
Thomas
+1 The transaction possibility is absolutely correct, and I just didn't consider *two* other transactions, the 2nd being committed first. I'm erasing my answer. The good news is I get my disciplined badge finally! Yay!
Patrick Karcher
A: 

The only thing that SQL Server guarantees is that your IDENTITY column will always be incremented.

Things to consider though:

  1. If a fail INSERT occurs, the IDENTITY column will get incremented anyhow;
  2. If a rollback occurs, the IDENTITY column will not return to its previous value;

Which explains why SQL Server doesn't guarantee sequential INDENTITY.

There is a way to reset an IDENTITY column like so using the DBCC command. But before doing so, please consider the following:

  1. Ensure your IDENTITY column is not referenced by any other table, as your foreign keys could be not updated with it, so big troubles ahead;
  2. You might use the SET IDENTITY_INSERT ON/OFF instruction so that you may manually specify the IDENTITY while INSERTing a row (never forget to turn it on afterward).

An IDENTITY column is one of the most important element never to be changed in DBRMs.

Here is a link that should help you: Understanding IDENTITY columns

EDIT: What you seem to do shall work as the IDENTITY column from LastMax will always increment for each INSERTed row. So:

  1. Selecting rows from data table;
  2. Saving LastMax state;
  3. Selecting rows where Id > LastMax.

3) will only select rows where the IDENTITY column will be greater than LastMax, so inserted since LastMax has been saved.

Will Marcouiller
+2  A: 

I think this can go wrong depending on the duration of transactions Consider the following sequence of events:

  1. Transaction A starts
  2. Transaction A performs insert - This creates a new entry in the identity column
  3. Transaction B starts
  4. Transaction B performs insert - This creates a new entry in the identity column
  5. Transaction B commits
  6. Your code performs its select and sees the identity value from the 2nd transaction
  7. Transaction A commits -

The row inserted by Transaction A will never be found by your code. It was not already committed when step 6 was performed. And when the next query is performed it will not be found, because it has a lower value in the identity column than the query is looking for.

It could work if you perform the query with a read-uncommitted isolation mode

Wolfgang
yep, that's exactly my worry
balpha
+1 The transaction possibility is absolutely correct, and I just didn't consider *two* other transactions, the 2nd being committed first. I'm erasing my answer.
Patrick Karcher