views:

143

answers:

1

Hi.
Im searching for a very high performant possibility to insert data into a MS SQL database. The data is a (relatively big) construct of objects with relations. For security reasons i want to use stored procedures instead of direct table access.

Lets say i have a structure like this:

  • Document
    • MetaData
      • User
      • Device
    • Content
      • ContentItem[0]
        • SubItem[0]
        • SubItem[1]
        • SubItem[2]
      • ContentItem[1]
        • ...
      • ContentItem[2]
        • ...

Right now I think of creating one big query, doing somehting like this (Just pseudo-code):

EXEC @DeviceID = CreateDevice ...;
EXEC @UserID = CreateUser ...;
EXEC @DocID = CreateDocument @DeviceID, @UserID, ...;

EXEC @ItemID = CreateItem @DocID, ...
EXEC CreateSubItem @ItemID, ...
EXEC CreateSubItem @ItemID, ...
EXEC CreateSubItem @ItemID, ...
...

But is this the best solution for performance? If not, what would be better? Split it into more querys? Give all Data to one big stored procedure to reduce size of query? Any other performance clue?

I also thought of giving multiple items to one stored procedure, but i dont think its possible to give a non static amount of items to a stored procedure. Since 'INSERT INTO A VALUES (B,C),(C,D),(E,F) is more performant than 3 single inserts i thought i could get some performance here.

Thanks for any hints, Marks

+1  A: 

One stored procedure so far as possible:

INSERT INTO MyTable(field1,field2)
SELECT "firstValue", "secondValue"
UNION ALL
SELECT "anotherFirstValue", "anotherSecondValue"
UNION ALL

If you aren't sure about how many items you're inserting you can construct the SQL query witin the sproc and then execute it. Here's a procedure I wrote to take a CSV list of groups and add their relationship to a user entity:

ALTER PROCEDURE [dbo].[UpdateUserADGroups]
@username varchar(100),
@groups varchar(5000)
AS
BEGIN
DECLARE @pos int,
@previous_pos int,
@value varchar(50),
@sql varchar(8000)

SET @pos = 1
SET @previous_pos = 0
SET @sql = 'INSERT INTO UserADGroups(UserID, RoleName)'

DECLARE @userID int
SET @userID = (SELECT TOP 1 UserID FROM Users WHERE Username = @username) 

WHILE @pos > 0
BEGIN
 SET @pos = CHARINDEX(',',@groups,@previous_pos+1)
 IF @pos > 0
 BEGIN
 SET @value = SUBSTRING(@groups,@previous_pos+1,@pos-@previous_pos-1)
 SET @sql = @sql + 'SELECT ' + cast(@userID as char(5)) + ',''' + @value + ''' UNION ALL '
SET @previous_pos = @pos
END
END

IF @previous_pos < LEN(@groups)
BEGIN
  SET @value = SUBSTRING(@groups,@previous_pos+1,LEN(@groups))
  SET @sql = @sql + 'SELECT ' + cast(@userID as char(5)) + ',''' + @value + ''''
END
print @sql
exec (@sql)

END

This is far faster than individual INSERTS.

Also, make sure you just a single clustered index on the primary key, more indexes will slow the INSERT down as they will need to update.

However, the more complex your dataset is, the less likely it is that you'll be able to do the above so you will simply have to make logical compromises. I actually end up calling the above routine around 8000 times.

David Neale
Are you really sure, that building a query is faster than individual inserts(made benchmarks)? I think SQL server will not be able to use cached execution plans for on-the-fly generated querys. But i will give it a try.Any idea on the problem with ids? I think it would be more performant to generate all parent elements, and then all child elements. But how can set the parents id in the child elements foreign key?
Marks
The only way to tell is to profile, I found that it made my entire routine much faster (can't remember the times). I don't know if it will cache the execution plan but it doesn't really need to as it'll be so basic. My example will only work will one level of nesting (one record with many basic children), maybe take a look at serialising and parsing your items from XML - http://weblogs.asp.net/jgalloway/archive/2007/02/16/passing-lists-to-sql-server-2005-with-xml-parameters.aspx
David Neale
I did some profiling with a CreateUser SP of mine. Case 1: EXEC CreateUser @Name=..., ... * 1000 rowsCase 2: INSERT INTO Users (...) VALUES (...),(...),... * 1000 rowsCase 1: first time: 220ms , repeated: ~220msCase 2: first time: 2960ms, repeated: ~20msEven when not creating the INSERT INTO on-the-fly, its much slower for every first run with another number of users to insert becouse the cached execution plan is fix for the number of users submitted.I will look for a Xml solution, and hope to get better results. But thanks.
Marks
That makes sense, I'll bear it in mind for future. Be sure to post an answer to this question if you find another solution, I'll be especially interested to hear if you do something around XML.
David Neale
Is there any possibility to say how much execution plans can be stored in the cache? Or if they are deleted after a while? As the number of Items, that are inserted at once do not differ that much i am thinking of using the INSERT INTO A (...) VALUES (...)(...) solution and injecting the execution plans for common numbers at installation time. It would need approximately 100 different executions. I think, thats something the cache should be able to handle, but i dont find any information on this.
Marks
I'm not sure to be honest. A cache is not designed for persistant data storage and I don't think there's anyway to manually add query plans to it.
David Neale
But you can shoot the querys with dump data with different numbers if items, so the exectuion plan will be cached. But i think there is something like a cache clear if the plan is not used for a while.
Marks