ansaurus

Question

Answer 1

+2 A:

If I understand what's happening you're querying the database to see if the data is there in step 1. I'd use a db call to a stored procedure that that inserts the data if it not there. So just compute the results and pass to the sp.

Can you compute the results first, and then insert in batches?

Does the compute function take data from the database? If so can you turn the operation in to a set based operation and perform it on the server itself? Or may part of it?

Remember that sql server is designed for a large dataset operations.

Edit: reflecting comments Since the code is slow on the data inserts, and you suspect that it's because the insert has to search back before it can be done, I'd suggest that you may need to place SQL Indexes on the columns that you search on in order to improve searching speed.

However I have another idea.

Why don't you just insert the data without the check and then later when you read the data remove the duplicates in that query?

Preet Sangha 2010-10-12 10:49:57

Ok. I am going to test it. But in this case, I cannot do a SqlBulkCopy to improve the performance of my insertions (step 2) ?

Patrice Pezillier 2010-10-12 10:54:43

I need to get id of my parameter in the database before insert in batches.Compute() function doesn't take any data from database. I can't perform insertions during Compute() because I only have results at the end of the Compute() function.

Patrice Pezillier 2010-10-12 10:58:32

no I mean pass the data to the SP and let is work out if the data exists.

Preet Sangha 2010-10-12 11:50:51

@Preet Sangha: I've just tested with SP. Performance is the same.

Patrice Pezillier 2010-10-12 14:07:35

Just a thought, is it the performance of the insert call (with the sub proc call) that is slow? Of is it the overall computer call as well. if you remove the FillTheResults call is the performance of the compute calls ok? Can you put some numbers in to show the comparison.

Preet Sangha 2010-10-12 20:35:29

@Preet Sangha: Without FillTheResults call, performance are very very good.Performance are bad with FillTheResults because I have to search each idTableParameter for insertion in the TableValue

Patrice Pezillier 2010-10-13 11:09:06

Ok then have you looked at the execution plan? Do you need indexes on the columns?

Preet Sangha 2010-10-13 11:19:09

@Preet Sangha: "Why don't you just insert the data without the check and then later when you read the data remove the duplicates in that query?" => Trouble is just postpone...With indexes I win a little during 'select where' but i loose time during insertion. Globally, It is nearly the same perf.

Patrice Pezillier 2010-10-13 12:52:25

Answer 2

A:

Given the fact that name2 - name3 can be null, would it be possible to restructure the parameter table:

TableParameter
  Id    (int, PRIMARY KEY, IDENTITY)
  Name  (string)
  Dimension int

Now you can index it and simplify the query. (WHERE name = "TheNameIWant" AND Dimension="2")

(And speaking of indices, you do have index the name columns in the parameter table?)

Where do you do your commits on the insert? if you do one statement commits, group multiple inserts into one.

If you are the only one inserting values, if speed is really of essence, load all values from the database into the memory and check there.

just some ideas

hth

Mario

Mario The Spoon 2010-10-12 11:12:19

Answer 3

A:

I must admit that I'm struggling to grasp the business process that you are trying to achieve here.

On initial review, it appears as if you are are performing a data comparison within your application tier. I would advise against this and suggest that you let the Database Engine do what it is designed to do, to manage and implement your data access.

As another poster has mentioned, I concur that you should look to create a Stored Procedure to handle your record insertion logic. The procedure can perform a simple check to see if your records already exist.

You should also consider:

Enforcing the insertion logic/rule by creating a Unique Constraint across the four name columns.
Creating a covering non-clustered index incorporating the four name columns.

With regard to performance of your inserts, perhaps you can provide some metrics to qualify what it is that you are seeing and how you are measuring it?

To give you a yardstick the current ETL insertion record for SQL Server is approx 16 million rows per second. What sort of numbers are you expecting and wanting to see?

John Sansom 2010-10-12 13:30:30

@John Sansom: Thanks for your advice. Can you take a look at my SP (edited post) ?I will add a Unique Constraint across the four name colums and the index.10 000 rows take more than 60 seconds : 166 rows/s... I'd like to add 100 000 rows in less than 1 minute : 1666 rows/s would be good :-)

Patrice Pezillier 2010-10-12 14:19:16

Looking at your procedures it would seem that you are processing at a record by record level. As another poster mentioned, you will see significantly improved performance if you design a set based solution. For example create a temporary table containing all records to be processed and index the table on the join predicate, then LEFT OUTER join this to the TableParameter table, records that do not return an existing ID can then be inserted.

John Sansom 2010-10-12 16:46:58

@John Sansom: Thanks ! I've coded some procedures with temporary table and with LEFT OUTER join as your suggested : from SQL Server, performance are very very good ! Now, I will try to call these procedures from my c# program. I is annoying to see that I can't pass a list of objects in a store procedure...

Patrice Pezillier 2010-10-27 13:40:59

@Patrice Pezzillier: Good stuff.

John Sansom 2010-10-28 08:07:18

Answer 4

A:

the fastest way ( i know so far) is bulk insert. but not just lines of INSERT. try insert + select + union. it works pretty fast.

insert into myTable
select a1, b1, c1, ...
union select a2, b2, c2, ...
union select a3, b3, c3, ...

888 2010-10-12 13:44:12

ansaurus

tags:

views:

answers:

C# code and SQL Server performance

related questions