ansaurus

Question

What's the right pattern for unique data in columns?

Answer 1

A:

Using the Exists function might clean things up a little.

if (Exists(select * from table_name where column_name = @param)
begin
  //use existing file name
end
else
  //use new file name

Miyagi Coder 2009-01-20 22:44:31

Answer 2

+1 A:

First, create a unique index on the Name column. Then from your client code first check if the Name exists by selecting the FileID and putting the Name in the where clause - if it does, use the FileID. If not, insert a new one.

Otávio Décio 2009-01-20 22:45:29

Answer 3

+1 A:

If you are searching heavily on the Name field, you will probably want it indexed (as unique, and maybe even clustered if this is the primary search field). As you don't use the @FileID from the first select, I would just select count(*) from file where Name = @Name and see if it is greater than zero (this will prevent SQL from retaining any locks on the table from the search phase, as no columns are selected).

You are on the right course with the SERIALIZABLE level, as your action will impact subsequent queries success or failure with the Name being present. The reason the version without that set causes duplicates is that two selects ran concurrently and found there was no record, so both went ahead with the inserts (which creates the duplicate).

The deadlock with the prior version is most likely due to the lack of an index making the search process take a long time. When you load the server down in a SERIALIZABLE transaction, everything else will have to wait for the operation to complete. The index should make the operation fast, but only testing will indicate if it is fast enough. Note that you can respond to the failed transaction by resubmitting: in real world situations hopefully the load will be transient.

EDIT: By making your table indexed, but not using SERIALIZABLE, you end up with three cases:

Name is found, ID is captured and used. Common
Name is not found, inserts as expected. Common
Name is not found, insert fails because another exact match was posted within milliseconds of the first. Very Rare

I would expect this last case to be truly exceptional, so using an exception to capture this very rare case would be preferable to engaging SERIALIZABLE, which has serious performance consequences.

If you do really have an expectation that it will be common to have posts within milliseconds of one another of the same new name, then use a SERIALIZABLE transaction in conjunction with the index. It will be slower in the general case, but faster when these posts are found.

Godeke 2009-01-20 22:53:03

Adding an index is a good thing but it doesn't solve the problem it just mutates it. Now on the client instead of getting a deadlock exception i get an exception about violating the index.Since it seems that i need to handle the exception on the client regardless why use the transaction?

Marc 2009-01-21 01:19:30

I have edited the answer to explain why I think that a very rare exception that is handled is better than operating under SERIALIZABLE. Note that if I am incorrect and the posting of the same *new* name at the same time is common, SERIALIZABLE is preferable.

Godeke 2009-01-21 15:00:34

Thanks, it's true that it'll be a rare case. I imagine in a real world distributed system this is a common pattern.I'm also new to this web site is there something I should do to mark your answer as the one I'm going to use?

Marc 2009-01-21 19:33:52

Just click the check mark next to the answer, that let's others know that the answer worked for you, and removes it from the "unanswered" queue.

Godeke 2009-01-22 02:53:17

ansaurus

tags:

views:

answers:

What's the right pattern for unique data in columns?

related questions