ansaurus

Question

SQL Server INSERT, Scope_Identity() and physical writing to disc

Answer 1

+2 A:

There is no way you can rely on IDENTITY to solve this for your second table. If you care about the generated primary key value for that row, you should generate itself.

IDENTITY is a way of saying "I don't want the hassle of generating a key myself, just do it for me, and I'll ask for the generated value if and when I need it".

What could be happening here is that two threads are inserting the rows at the same time, none of them have committed yet, so you get this scenario:

Thread 1                      Thread 2
get id for table 1 = 4709
                              get id for table 1 = 4710
insert row for table 1
                              insert row for table 1
                              get id for table 2 = 4709
get id for table 2 = 4710
                              insert row for table 2
insert row for table 1

You have two ways to solve your problem:

Remove IDENTITY for the primary key in the second table
Use SET IDENTITY_INSERT ON to allow you to provide a key for it, while keeping the IDENTITY setting

In this case, however, I would use method nbr. 1. Method nbr. 2 is usually used when importing data into an empty table. You don't want the risk of the database auto-generating an ID you later on want to use yourself (since it comes from the first table), and so you should disable IDENTITY setting on the primary key of the second table.

Or you could try to avoid relying on the key for that table at all, since you have a foreign key reference, do you really need the key values to be the same?

Lasse V. Karlsen 2010-03-15 10:26:31

Very informative... thanks.I totally agree with you. The problem is that I'm working with a system that has a very high number of trasactions (millions a day), and this system is relying a lot on the IDENTITY columns to identify a row and link tables togother. It's something I can't change and have to live with... I just want to make sure that I live in the safest side, or the less risky side for that matter :)

TheBlueSky 2010-03-16 12:53:22

Answer 2

+1 A:

Regading writing:

Whenever you do something that changes data, this is written to the database LOGS that moment, and you dont get a transaction confirm until this has happened. That is the D in ACID conditions (database theory).
Dirty database pages are written to disk "in the background". If too many are dirty, a checkpoint is triggered and they are all dumped out.

So far to the writing part.

Waht you probably run into is simlpy the fact that while individual statements are atomic, a busy atabase has possibly more than one thread running along it. So, basically, a thread switch happened between the statements. One thread got Id1, another one prioerity, id1, id2, then the first one id2.

Nothing specific here ;) Typical normal database behavior when multiple threads run along. Nothing to do with writing per se.

Basically, between SET @MyID = Scope_Identity() and the next statement, another thread can get priority ;)

TomTom 2010-03-15 10:26:36

I remember something like that we were taught in university during DB class ;)

TheBlueSky 2010-03-16 13:06:09

Dont expect people these days to know the basics of the technology they use ;) This is really really basic stuff, but you wong find it in "SQL in 21 days" and people dont even read that, just look into how-to websites these days ;)

TomTom 2010-03-16 13:52:49

This is an unfortunate, but I can't blame everyone for not knowing the basics. Sometimes when your boss is chasing you to finish that piece of code, you are less likely to ask about the basics and more likely to go directly to the bottom line... although the real answer can be in the basics :)

TheBlueSky 2010-03-17 06:13:30

Depends. How can a boss let anyone touch SQL who obviously does not know databases ;) That being said, the amount of "i get by with simple statements" people is astonishing those days ;)

TomTom 2010-03-17 06:44:49

Totally agree. Before I started this topic, I had an argument with a colleague about using IDENTITY columns to identify rows and linking table (foreign keys) and I was on the "against" side. Not that we both can change anything at this stage, as we both "inherited" the system. My idea now is, if you can't fix it, don't make things worse, and play as safe as possible :)

TheBlueSky 2010-03-17 07:44:39

I acutally like identity columns, but they should be identity columns PER TABLE ;)

TomTom 2010-03-17 07:49:45

Answer 3

+2 A:

Of course your above scenario is possible - and quite likely, too.

If you have two separate, independent tables, both being used for queries and inserts, both with a separate IDENTITY(1,1) field, there's absolutely no guarantee that an insert into one table and then into the second will be executed in the same order!

If you do need to establish a link between the two, insert the first table's ID into the second table as a foreign key. You cannot rely on the ID's generated from IDENTITY's to be the same in both tables!

marc_s 2010-03-15 10:28:34

Thanks for the answer... it really made me feel relieved :)However, I'm actually inserting thee ID from the first table as a foreign key in the other, but the problem is that my ID is the IDENTITY column, which is causing the problem :)

TheBlueSky 2010-03-16 12:44:35

Answer 4

A:

do not rely on the actual values of identity columns for business/application logic you can only assume that they will be unique!

KM 2010-03-15 13:23:59

Answer 5

A:

This is a known bug in SQL Server.

The problem is when it generates the query plan the parallelisation causes scope identity to be incorrect.

Move that part into its own procedure, so pass in the params and return the scope identity - Now it should be correct.

If I remember rightly, this only manifests on tables with around a million rows or more.

Aha, here's the KB: http://support.microsoft.com/default.aspx?scid=kb;en-us;2019779&sd=rss&spid=2855

Meff 2010-03-15 13:44:05

I read about this bug a while ago; I'm not sure if it's related to my question here though. In my case, the ID is generated correctly in the right expected order for each of the two tables, but the order of executing the second insert statement in different loop iterations wasn't as expected.

TheBlueSky 2010-03-16 12:59:26

unrelated to the bug. Thi is a simple threading issue.

TomTom 2010-03-16 13:53:48

Answer 6

A:

You should be able to avoid this issue by using a SQL 2005 feature, the OUTPUT clause. Link below.

http://msdn.microsoft.com/en-us/library/ms177564.aspx

Jason Cumberland 2010-03-15 17:19:04

No, I don't think so. I'll face the same problem if the second INSERT statement got executed before the first one for any reason... i.e. I won't get the the value of the IDENTITY value of the record just inserted.

TheBlueSky 2010-03-16 13:11:22

I don't think the OUTPUT clause uses the same construct as scope_identity(), it is actually part of the insert statement and outputs the just inserted records into a memory table. As long as you pass in a uniqueidentifier to the memory table as well you shouldn't have any contention problems.

Jason Cumberland 2010-03-16 17:08:03

That I understood. My concern is what if the two INSERT statements were in different threads and the first one got suspended/delayed for whatever reason? Then the second INSERT will be executed first and before I actually get the IDENTITY from the OUTPUT clause. This can happen as per my understanding from all the discussion above.

TheBlueSky 2010-03-17 06:18:51

ansaurus

tags:

views:

answers:

SQL Server INSERT, Scope_Identity() and physical writing to disc

related questions