views:

2411

answers:

10

Hi all.

I have one stored procedure which inserts data into 3 tables, (does UPSERTS), and has some rudamentary logic. (IF-THEN-ELSE)

I need to execute this Sproc millions of times (From a C# app) using different parameters and I need it to be FAST.

What is the best way to do so?

Does anybody know an open-source (or not) off the shelf document indexer besides Lucene or Sql Server FTS??

*I am trying to build a document word-index. For each word in the document I insert into the DB the word, docID, and word position.

This happens 100000 times for 100 documents for example.

The Sproc : there are 3 tables to insert into, for each one I do an UPSERT.

The C# app :

using (SqlConnection con = new SqlConnection(_connectionString))
            {
                con.Open();
                SqlTransaction trans = con.BeginTransaction();
                SqlCommand command = new SqlCommand("add_word", con, trans);
                command.CommandType = System.Data.CommandType.StoredProcedure;
                string[] TextArray;
                for (int i = 0; i < Document.NumberOfFields; i++)
                {
                  ...
                 Addword(..., command);  <---- this updates parameters with new values and ExecuteNonQuery.
                }

            }

I Forgot to mention , this code produces deadlocks in Sql Server . I have no idea why this happens.

+1  A: 

This is probably too generic as a requirement - in order for the procedure to be fast itself we need to see it and have some knowledge of your db-schema.

On the other end if you want to know what the best way to execute as fast as possible the same (non-optimized or optimized) procedure, usually the best way to go is to do some sort of caching on the client and call the procedure as few times as possible batching your operations.

If this is in a loop, what people usually do is - instead of calling the procedure each iteration - build/populate some caching data structure that will call down to the store procedure when the loop exits (or any given number of loops if you need this to happen more often) batching the operations that you cached (i.e. you can pass an xml string down to your sp which will then parse it, put the stuff in temp tables and then go from there - you can save a whole lot of overhead like this).

Another common solution solution for this is to use SqlServer Bulk operations.

To go back to the stored procedure - keep into account that optimizing your T-SQL and db-schema (with indexes etc.) can have a glorious impact on your performance.

JohnIdol
+2  A: 

If you want to quickly bulk INSERT data from C#, check out the SqlBulkCopy class (.NET 2.0 onwards).

Mitch Wheat
would the downvoter please leave a comment. Thanks.
Mitch Wheat
A: 

--Edited since the question was edited.

The biggest issue is to make sure the stored proc is correctly tuned. Your C# code is about as fast as you are going to get it.

Chris Brandsma
+3  A: 
  1. Drop all the indexes on the table(s) you are loading, then add them back in once the load is complete. This will prevent a lot of thrashing / reindexing for each change.

  2. Make sure the database has allocated enough physical file space prior to the load that way it doesn't have to spend time constantly grabbing it from the file system as you load. Usually databases are set to grow by something like 10% when full at which point sql server blocks queries until more space is allocated. When loading the amount of data you are talking about, sql will have to do a lot of blocking.

  3. Look into bulk load / bulk copy if possible.

  4. Do all of your IF THEN ELSE logic in code. Just send the actual values you want stored to the s'proc when it's ready. You might even run two threads. One to evaluate the data and queue it up, the other to write the queue to the DB server.

  5. Look into Off The Shelf programs that do exactly what you are talking about with indexing the documents. Most likely they've solved these problems.

  6. Get rid of the Transaction requirements if possible. Try to keep the s'proc calls as simple as possible.

  7. See if you can limit the words you are storing. For example, if you don't care about the words "it", "as", "I", etc then filter them out BEFORE calling the s'proc.

Chris Lively
My If-Then-Else logic relies on checking if rows appear/don't appear in tables , so moving them to the C# code means more Sproc executions in the loop...
Roey
Hmm.. Then I guess you can't do that part. The others should help though.
Chris Lively
A: 

If you're trying to optimize for speed, consider simply upgrading your SQL Server hardware. Putting some RAM and a blazing fast RAID in your server may be the most cost effective long-term solution to speed up your query speed. Hardware is relatively cheap compared to developer time.

Heed the words of Jeff Atwood:

Coding Horror: Hardware is Cheap, Programmers are Expensive

SurroundedByFish
Too bad many managers/accountants look at hardware as an additional expense, but your salary is the same regardless of how long it takes to optimize a query.
Jeff O
+1  A: 

This might seem like a rudimentary approach, but it should work and it should be fast. You can just generate a huge textfile with a list of SQL statements and then run it from a command line. If I’m not mistaken it should be possible to batch commands using the GO statement. Alternatively, you can do it directly from you application concatenating several SQL commands as strings and execute them in batches. It seems that what you are trying to do is a onetime task and that the data does not come directly as auser input. So you should be able to handle escapign yourself.

I’m sure there are more sophisticated ways to do that (the SqlBulkCopy looks like a good start), so please consider this as just a suggestion. I would spend some time investigating whether there are not more elegant ways better ways first.

Also, I would make sure that the logic in the stored procedure is as simple as possible and that the table does not have any indexes. They should be added later.

Jan Zich
+1, I would do it this way, if my data was already in C#. Be aware that some indexes may be helpful/important (since this is an UPSERT and not an INSERT).
David B
The GO keywords is meaningful ONLY for Sql Server utilities like Query Analyzer (in the Sql2K realm), while are NOT recognized by the .NET SqlClient classes: therefore you should remove them and - if necessary - execute the statement one at a time. FYI, look my answer to this post: http://stackoverflow.com/questions/784953/using-table-just-after-creating-it-object-does-not-exist/785096#785096
Turro
A: 

The communication with the database will likely be a bottle-neck in this case, especially if the db is on another machine. I suggest sending the entire document to the database and writing a sproc that splits it into words, or use sql-server hosted managed code.

Per Erik Stendahl
+1  A: 

Hey,

Try use XML to do that.

You just will need execute 1 time:

Example:

DECLARE @XMLDoc XML
SET @XMLDoc = '<words><word>test</word><word>test2</word></words>'

CREATE PROCEDURE add_words
(
    @XMLDoc XML
)
AS

DECLARE @handle INT

EXEC sp_xml_preparedocument @handle OUTPUT, @XMLDoc

INSERT INTO TestTable
SELECT * FROM OPENXML (@handle, '/words', 2) WITH 
  (
    word varchar(100)
  )
EXEC sp_xml_removedocument @handle
Zanoni
A: 

Assuming this is an app where there would not be contention between multiple users, try this approach instead:

  • Insert your parameters into a table set up for that purpose
  • Change your SP to loop through that table and perform its work on each row
  • Call the SP once
  • Have the SP truncate the table of inputs when it is complete

This will eliminate the overhead of calling the SP millions of times, and the inserts of the parameters into the table can be concatenated ("INSERT INTO foo(v) VALUE('bar'); INSERT INTO foo(v) VALUE('bar2'); INSERT INTO foo(v) VALUE('bar3');").

Disadvantage: the SP is going to take a long time to execute, and there won't be any feedback of progress, which isn't terribly user-friendly.

richardtallent
A: 

To move over a lot of data to the server, use either SqlBulkCopy or table valued parameter if you are on 2008. If you need speed, do not execute a stored procedure once per row, develop a set based one that processes all (or a large batch of) rows.

AlexKuznetsov