views:

5312

answers:

6

Hi Folks,

I'm trying to figure out the best way to insert a record into a single table but only if the item doesn't already exist. The KEY in this case is an NVARCHAR(400) field. For this example, lets pretend it's the name of a word in the Oxford English Dictionary / insert your fav dictionary here. Also, i'm guessing i will need to make the Word field a primary key. (the table will also have a unique identifier PK also).

So .. i might get these words that i need to add to the table...

eg.

  • Cat
  • Dog
  • Foo
  • Bar
  • PewPew
  • etc...

So traditionally, i would try the following (pseudo code)

SELECT WordID FROM Words WHERE Word = @Word
IF WordID IS NULL OR WordID <= 0
    INSERT INTO Words VALUES (@Word)

ie. If the word doesn't exist, then insert it.

Now .. the problem i'm worried about is that we're getting LOTS of hits .. so is it possible that the word could be inserted from another process in between the SELECT and the INSERT .. which would then throw a constraint error? (ie. a Race Condition).

I then thought that i might be able to do the following ...

INSERT INTO Words (Word)
SELECT @Word
WHERE NOT EXISTS (SELECT WordID FROM Words WHERE Word = @Word)

basically, insert a word when it doesn't exist.

Bad syntax aside, i'm not sure if this is bad or good because of how it locks down the table (if it does) and is not that performant on a table that it getting massive reads and plenty of writes.

So - what do you Sql gurus think / do?

I was hoping to have a simple insert and 'catch' that for any errors thrown.

+3  A: 

Your solution:

INSERT INTO Words (Word)
    SELECT @Word
WHERE NOT EXISTS (SELECT WordID FROM Words WHERE Word = @Word)

...is about as good as it gets. You could simplify it to this:

INSERT INTO Words (Word)
    VALUES(@Word)
WHERE NOT EXISTS (SELECT * FROM Words WHERE Word = @Word)

...because EXISTS doesn't actually need to return any records, so the query optimiser won't bother looking at which fields you asked for.

As you mention, however, this isn't particularly performant, because it'll lock the whole table during the INSERT. Except that, if you add a unique index (it doesn't need to be the primary key) to Word, then it'll only need to lock the relevant pages.

Your best option is to simulate the expected load and look at the performance with SQL Server Profiler. As with any other field, premature optimisation is a bad thing. Define acceptable performance metrics, and then measure before doing anything else.

If that's still not giving you adequate performance, then there's a bunch of techniques from the data warehousing field that could help.

Roger Lipscombe
Is there a performance difference between EXISTS (SELECT * FROM ...) and EXISTS (SELECT 1 FROM ...)? I always tend to use the latter, to avoid the star - is that beneficial or is it all the same?
Tomalak
I suspect that it's all the same, but you'd have to fire it up in SQL Server Profiler to be sure.
Roger Lipscombe
I too never to a * in my selects, but always specify the return fields/value.
Pure.Krome
@ Roger: also, this is not premature optimisation (which i completely agree with your statement, on that). This is optimising my existing app, which is getting timeouts and deadlocks and it's relating to this sql statement :(
Pure.Krome
Just to continue this discussion ... can u do your sql code as LINQ 2 SQL ?
Pure.Krome
@Pure.Krome: No I can't -- I don't use much Linq to SQL.
Roger Lipscombe
+2  A: 

If you are using MS SQL Server, you can create a unique index on your table's columns that need to be unique (documented here):

CREATE UNIQUE [ CLUSTERED | NONCLUSTERED ] INDEX <index_name>
    ON Words ( word [ ASC | DESC ])

Specify Clustered or NonClustered, depending on your case. Also, if you want it sorted (to enable faster seeking), specify ASC or DESC for the sort order.

See here, if you want to learn more about indexes architecture.

Otherwise, you could use UNIQUE CONSTRAINTS like documented here:

ALTER TABLE Words
ADD CONSTRAINT UniqueWord
UNIQUE (Word);
Bogdan Maxim
A: 

I can't speak to the particulars of MS SQL, but one point of a primary key in SQL is to ensure uniqueness. So by definition in generic SQL terms, a primary key is one or more fields that is unique to a table. While there are different ways to enforce this behavior (replace the old entry with the new one vs. reject the new one) I would be surprised if MS SQL both didn't have a mechanism for enforcing this behavior and that it wasn't to reject the new entry. Just make sure you set the primary key to the Word field and it should work.

Once again though, I disclaim this is all from my knowledge from MySQL programming and my databases class, so apologies if I'm off on the intricacies of MS SQL.

Dan Fego
A: 

while unique constraint is certaily one way to go you can also use this for your insert logic: http://www.sqlteam.com/article/application-locks-or-mutexes-in-sql-server-2005

basicaly you don't put any locks on the table below thus not worrying about the reads while your existance checks will be performed ok.

it's a mutex in sql code.

Mladen Prajdic
A: 
declare @Error int

begin transaction
  INSERT INTO Words (Word) values(@word)
  set @Error = @@ERROR
  if @Error <> 0 --if error is raised
  begin
      goto LogError
  end
commit transaction
goto ProcEnd

LogError:
rollback transaction
Dmitry Khalatov
Dude - i said in my opening post that i didn't want to have to a dumb insert and then check for errors. that's not very smart IMO.
Pure.Krome
+1  A: 

I had similar problem and this is how I solved it

insert into Words
( selectWord , Fixword)
SELECT word,'theFixword'
FROM   OldWordsTable
WHERE 
(
    (word LIKE 'junk%') OR
     (word LIKE 'orSomthing') 

)
and word not in 
    (
     SELECT selectWord FROM words WHERE selectWord = word
    )
Pbearne