views:

304

answers:

6

Hello everyone,

I have a stored procedure that does, among other stuff, some inserts in different table inside a loop. See the example below for clearer understanding:

INSERT INTO T1 VALUES ('something')

SET @MyID = Scope_Identity()

... some stuff go here

INSERT INTO T2 VALUES (@MyID, 'something else')

... The rest of the procedure

These two tables (T1 and T2) have an IDENTITY(1, 1) column in each one of them, let's call them ID1 and ID2; however, after running the procedure in our production database (very busy database) and having more than 6250 records in each table, I have noticed one incident where ID1 does not match ID2! Although normally for each record inserted in T1, there is record inserted in T2 and the identity column in both is incremented consistently.

The "wrong" records were something like that:

ID1     Col1
----    ---------
4709    data-4709
4710    data-4710

ID2     ID1     Col1
----    ----    ---------
4709    4710    data-4710
4710    4709    data-4709

Note the "inverted", ID1 in the second table.

Knowing not that much about SQL Server underneath operations, I have put the following "theory", maybe someone can correct me on this.

What I think is that because the loop is faster than physically writing to the table, and/or maybe some other thing delayed the writing process, the records were buffered. When it comes the time to write them, they were wrote in no particular order.

Is that even possible if no, how to explain the above mentioned scenario?

If yes, then I have another question to rise. What if the first insert (from the code above) got delayed? Doesn't that mean I won't get the correct IDENTITY to insert into the second table? If the answer of this is also yes, what can I do to insure the insertion in the two tables will happen in sequence with the correct IDENTITY?

I appreciate any comment and information that help me understand this.

Thanks in advance.

+2  A: 

There is no way you can rely on IDENTITY to solve this for your second table. If you care about the generated primary key value for that row, you should generate itself.

IDENTITY is a way of saying "I don't want the hassle of generating a key myself, just do it for me, and I'll ask for the generated value if and when I need it".

What could be happening here is that two threads are inserting the rows at the same time, none of them have committed yet, so you get this scenario:

Thread 1                      Thread 2
get id for table 1 = 4709
                              get id for table 1 = 4710
insert row for table 1
                              insert row for table 1
                              get id for table 2 = 4709
get id for table 2 = 4710
                              insert row for table 2
insert row for table 1

You have two ways to solve your problem:

  1. Remove IDENTITY for the primary key in the second table
  2. Use SET IDENTITY_INSERT ON to allow you to provide a key for it, while keeping the IDENTITY setting

In this case, however, I would use method nbr. 1. Method nbr. 2 is usually used when importing data into an empty table. You don't want the risk of the database auto-generating an ID you later on want to use yourself (since it comes from the first table), and so you should disable IDENTITY setting on the primary key of the second table.

Or you could try to avoid relying on the key for that table at all, since you have a foreign key reference, do you really need the key values to be the same?

Lasse V. Karlsen
Very informative... thanks.I totally agree with you. The problem is that I'm working with a system that has a very high number of trasactions (millions a day), and this system is relying a lot on the IDENTITY columns to identify a row and link tables togother. It's something I can't change and have to live with... I just want to make sure that I live in the safest side, or the less risky side for that matter :)
TheBlueSky
+1  A: 

Regading writing:

  • Whenever you do something that changes data, this is written to the database LOGS that moment, and you dont get a transaction confirm until this has happened. That is the D in ACID conditions (database theory).
  • Dirty database pages are written to disk "in the background". If too many are dirty, a checkpoint is triggered and they are all dumped out.

So far to the writing part.

Waht you probably run into is simlpy the fact that while individual statements are atomic, a busy atabase has possibly more than one thread running along it. So, basically, a thread switch happened between the statements. One thread got Id1, another one prioerity, id1, id2, then the first one id2.

Nothing specific here ;) Typical normal database behavior when multiple threads run along. Nothing to do with writing per se.

Basically, between SET @MyID = Scope_Identity() and the next statement, another thread can get priority ;)

TomTom
I remember something like that we were taught in university during DB class ;)
TheBlueSky
Dont expect people these days to know the basics of the technology they use ;) This is really really basic stuff, but you wong find it in "SQL in 21 days" and people dont even read that, just look into how-to websites these days ;)
TomTom
This is an unfortunate, but I can't blame everyone for not knowing the basics. Sometimes when your boss is chasing you to finish that piece of code, you are less likely to ask about the basics and more likely to go directly to the bottom line... although the real answer can be in the basics :)
TheBlueSky
Depends. How can a boss let anyone touch SQL who obviously does not know databases ;) That being said, the amount of "i get by with simple statements" people is astonishing those days ;)
TomTom
Totally agree. Before I started this topic, I had an argument with a colleague about using IDENTITY columns to identify rows and linking table (foreign keys) and I was on the "against" side. Not that we both can change anything at this stage, as we both "inherited" the system. My idea now is, if you can't fix it, don't make things worse, and play as safe as possible :)
TheBlueSky
I acutally like identity columns, but they should be identity columns PER TABLE ;)
TomTom
+2  A: 

Of course your above scenario is possible - and quite likely, too.

If you have two separate, independent tables, both being used for queries and inserts, both with a separate IDENTITY(1,1) field, there's absolutely no guarantee that an insert into one table and then into the second will be executed in the same order!

If you do need to establish a link between the two, insert the first table's ID into the second table as a foreign key. You cannot rely on the ID's generated from IDENTITY's to be the same in both tables!

marc_s
Thanks for the answer... it really made me feel relieved :)However, I'm actually inserting thee ID from the first table as a foreign key in the other, but the problem is that my ID is the IDENTITY column, which is causing the problem :)
TheBlueSky
A: 

do not rely on the actual values of identity columns for business/application logic you can only assume that they will be unique!

KM
A: 

This is a known bug in SQL Server.

The problem is when it generates the query plan the parallelisation causes scope identity to be incorrect.

Move that part into its own procedure, so pass in the params and return the scope identity - Now it should be correct.

If I remember rightly, this only manifests on tables with around a million rows or more.

Aha, here's the KB: http://support.microsoft.com/default.aspx?scid=kb;en-us;2019779&sd=rss&spid=2855

Meff
I read about this bug a while ago; I'm not sure if it's related to my question here though. In my case, the ID is generated correctly in the right expected order for each of the two tables, but the order of executing the second insert statement in different loop iterations wasn't as expected.
TheBlueSky
unrelated to the bug. Thi is a simple threading issue.
TomTom
A: 

You should be able to avoid this issue by using a SQL 2005 feature, the OUTPUT clause. Link below.

http://msdn.microsoft.com/en-us/library/ms177564.aspx

Jason Cumberland
No, I don't think so. I'll face the same problem if the second INSERT statement got executed before the first one for any reason... i.e. I won't get the the value of the IDENTITY value of the record just inserted.
TheBlueSky
I don't think the OUTPUT clause uses the same construct as scope_identity(), it is actually part of the insert statement and outputs the just inserted records into a memory table. As long as you pass in a uniqueidentifier to the memory table as well you shouldn't have any contention problems.
Jason Cumberland
That I understood. My concern is what if the two INSERT statements were in different threads and the first one got suspended/delayed for whatever reason? Then the second INSERT will be executed first and before I actually get the IDENTITY from the OUTPUT clause. This can happen as per my understanding from all the discussion above.
TheBlueSky