views:

3643

answers:

2

I have to insert some data periodically in my SQL Server database. But the feeds where I read the data repeats some data that was inserted before. When I use Linq-to-SQL to insert into the DB either some data is duplicated, or a primary key violation exception is raised, depending on the primary key.

How to insert the data without duplications and without exceptions? I don't want to avoid the exception with a try-catch, because once the exception is raised the rest of the data isn't inserted.

update I also found my own solution: I wrote a duplicated entries deletion stored procedure, which is run right after the InsertAllOnSubmit + SubmitChanges

+2  A: 

All you have to do is create a new instance of your class and then call InsertOnSumbit() on the table:

var foo = new MyFoo { Name = "foo1" };
var dc = new MyDataContext();
dc.Foos.InsertOnSubmit(foo);
dc.SubmitChanges();

The other thing you need to be sure of is how you're incrementing your ID column. In general, I always make sure to use the IDENTITY(1,1) setting on my ID columns. This is declared on your LINQ entity's id column like so:

[Column(AutoSync = AutoSync.OnInsert, IsPrimaryKey = true, IsDbGenerated = true)]
public Int32 Id { get; set; }

To avoid duplicates, what you really need is what we call in my shop an "append" functionality. IMHO, this is most easily accomplished with a stored procedure - we even have a template we use for it:

USE [<Database_Name, sysobject, Database_Name>]
GO

CREATE PROCEDURE [<Schema, sysobject, dbo>].[<Table_Name, sysobject, Table_Name>__append]
(
    @id INT OUTPUT,
    @<Key_Param, sysobject, Key_Param> <Key_Param_Type, sysobject, VARCHAR(50)>
)
AS
BEGIN

     SELECT @id = [id] FROM [<Schema, sysobject, dbo>].[<Table_Name, sysobject, Table_Name>s] (NOLOCK) WHERE [<Key_Param, sysobject, Key_Param>] = @<Key_Param, sysobject, Key_Param>

IF @id IS NULL 
BEGIN  
 INSERT INTO [<Schema, sysobject, dbo>].[<Table_Name, sysobject, Table_Name>s] ([<Key_Param, sysobject, Key_Param>]) 
 OUTPUT INSERTED.[id] INTO @inserted_ids
 VALUES (@<Key_Param, sysobject, Key_Param>)

 SELECT TOP 1 @id = [id] FROM @inserted_ids;
END
ELSE
BEGIN
 UPDATE [<Schema, sysobject, dbo>].[<Table_Name, sysobject, Table_Name>s]
 SET
  [<Key_Param, sysobject, Key_Param>] = @<Key_Param, sysobject, Key_Param>
 WHERE [id] = @id
END
END
GO

It is possible to do it in linq though, just query for a list of existing IDs (or whatever column you're keying off of):

var dc = new MyDataContext();
var existingFoos = dc.Foos.ToList();
var newFoos = new List<Foo>();
foreach(var bar in whateverYoureIterating) {
// logic to add to newFoos 
}
var foosToInsert = newFoos.Where(newFoo => !existingFoos.Any(existingFoo => newFoo.Id == existingFoo.Id));

dc.Foos.InsertAllOnSubmit(foosToInsert);
dc.SubmitChanges();
// use the next line if you plan on re-using existingFoos. If that's the case I'd wrap  dc.SubmitChanges() in a try-catch as well.
existingFoos.AddRange(foosToInsert);
Daniel Schaffer
waiting for your edit...
Jader Dias
The LINQ solution seems pretty slow. Querying a giant table a loading the contents in memory seems unfeasible for me. Maybe I should stick with the SP approach.
Jader Dias
That's what I figured... Though, you could speed the LINQ solution up by only selecting the properties you really need... for example, just the ID column. Also, if you can re-use the "existing" query for multiple runs, that'll help as well.
Daniel Schaffer
+1  A: 

Unfortunately, there's no way around it as Linq to SQL does not check the database before it performs the insert. The only way to do this is to query the database first to determine if the duplicate record exists and then add the record if it does not.

Ideally Linq to SQL would support the Ignore Duplicate Keys property on a SQL column. But unfortunately it does not at the moment.

Keltex