views:

92

answers:

3

I've a table [File] that has the following schema

CREATE TABLE [dbo].[File]
(
    [FileID] [int] IDENTITY(1,1) NOT NULL,
    [Name] [varchar](256) NOT NULL,
 CONSTRAINT [PK_File] PRIMARY KEY CLUSTERED 
(
    [FileID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

The idea is that the FileID is used as the key for the table and the Name is the fully qualified path that represents a file.

What I've been trying to do is create a Stored Procedure that will check to see if the Name is already in use if so then use that record else create a new record.

But when I stress test the code with many threads executing the stored procedure at once I get different errors.

This version of the code will create a deadlock and throw a deadlock exception on the client.

CREATE PROCEDURE [dbo].[File_Create]
    @Name varchar(256)
AS
    SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
    BEGIN TRANSACTION xact_File_Create
    SET XACT_ABORT ON

    SET NOCOUNT ON 
    DECLARE @FileID int
    SELECT @FileID = [FileID] FROM [dbo].[File] WHERE [Name] = @Name
    IF @@ROWCOUNT=0
    BEGIN
     INSERT INTO [dbo].[File]([Name])
     VALUES (@Name)
     SELECT @FileID = [FileID] FROM [dbo].[File] WHERE [Name] = @Name
    END

    SELECT * FROM [dbo].[File]
    WHERE [FileID] = @FileID

    COMMIT TRANSACTION xact_File_Create
GO

This version of the code I end up getting rows with the same data in the Name column.

CREATE PROCEDURE [dbo].[File_Create]
    @Name varchar(256)
AS
    BEGIN TRANSACTION xact_File_Create

    SET NOCOUNT ON 
    DECLARE @FileID int
    SELECT @FileID = [FileID] FROM [dbo].[File] WHERE [Name] = @Name
    IF @@ROWCOUNT=0
    BEGIN
     INSERT INTO [dbo].[File]([Name])
     VALUES (@Name)
     SELECT @FileID = [FileID] FROM [dbo].[File] WHERE [Name] = @Name
    END

    SELECT * FROM [dbo].[File]
    WHERE [FileID] = @FileID

    COMMIT TRANSACTION xact_File_Create
GO

I'm wondering what the right way to do this type of action is? In general this is a pattern I'd like to use where the column data is unique in either a single column or multiple columns and another column is used as the key.

Thanks

A: 

Using the Exists function might clean things up a little.

if (Exists(select * from table_name where column_name = @param)
begin
  //use existing file name
end
else
  //use new file name
Miyagi Coder
+1  A: 

First, create a unique index on the Name column. Then from your client code first check if the Name exists by selecting the FileID and putting the Name in the where clause - if it does, use the FileID. If not, insert a new one.

Otávio Décio
+1  A: 

If you are searching heavily on the Name field, you will probably want it indexed (as unique, and maybe even clustered if this is the primary search field). As you don't use the @FileID from the first select, I would just select count(*) from file where Name = @Name and see if it is greater than zero (this will prevent SQL from retaining any locks on the table from the search phase, as no columns are selected).

You are on the right course with the SERIALIZABLE level, as your action will impact subsequent queries success or failure with the Name being present. The reason the version without that set causes duplicates is that two selects ran concurrently and found there was no record, so both went ahead with the inserts (which creates the duplicate).

The deadlock with the prior version is most likely due to the lack of an index making the search process take a long time. When you load the server down in a SERIALIZABLE transaction, everything else will have to wait for the operation to complete. The index should make the operation fast, but only testing will indicate if it is fast enough. Note that you can respond to the failed transaction by resubmitting: in real world situations hopefully the load will be transient.

EDIT: By making your table indexed, but not using SERIALIZABLE, you end up with three cases:

  • Name is found, ID is captured and used. Common
  • Name is not found, inserts as expected. Common
  • Name is not found, insert fails because another exact match was posted within milliseconds of the first. Very Rare

I would expect this last case to be truly exceptional, so using an exception to capture this very rare case would be preferable to engaging SERIALIZABLE, which has serious performance consequences.

If you do really have an expectation that it will be common to have posts within milliseconds of one another of the same new name, then use a SERIALIZABLE transaction in conjunction with the index. It will be slower in the general case, but faster when these posts are found.

Godeke
Adding an index is a good thing but it doesn't solve the problem it just mutates it. Now on the client instead of getting a deadlock exception i get an exception about violating the index.Since it seems that i need to handle the exception on the client regardless why use the transaction?
Marc
I have edited the answer to explain why I think that a very rare exception that is handled is better than operating under SERIALIZABLE. Note that if I am incorrect and the posting of the same *new* name at the same time is common, SERIALIZABLE is preferable.
Godeke
Thanks, it's true that it'll be a rare case. I imagine in a real world distributed system this is a common pattern.I'm also new to this web site is there something I should do to mark your answer as the one I'm going to use?
Marc
Just click the check mark next to the answer, that let's others know that the answer worked for you, and removes it from the "unanswered" queue.
Godeke