tags:

views:

504

answers:

6

Does the length of the CommandText of an SqlCommand make a difference? I'm not talking about thousands of characters either. Here's what I have:

SqlCommand cmd = new SqlCommand();
cmd.Connection = conn;
cmd.CommandText = sql;

for (int i=0; i<1000; ++i)
{
    string name = i.ToString() + "Bob" + i.ToString();
    string email = i.ToString() + "Jim" + i.ToString();
    // etc...

    cmd.Parameters.Clear();
    cmd.Parameters.Add(new SqlParameter("@name", name));
    cmd.Parameters.Add(new SqlParameter("@country", country));

    DateTime cmdStart = DateTime.Now;
    cmd.ExecuteNonQuery();
    DateTime cmdEnd = DateTime.Now;
    TimeSpan len = cmdEnd - cmdStart;
}

If I use the following sql, the first iteration takes .5 seconds. The second takes 1.1 seconds. The third takes 3.3 seconds. And so on until it just throws for a timeout.

string sql =
    "INSERT INTO Test " +
    "           ([name] " +
    "           ,[email] " +
    "           ,[country] " +
    "           ,[comment] " +
    "           ,[date] " +
    "           ,[key_v0] " +
    "           ,[key_v1] " +
    "           ,[expires_v1] " +
    "           ,[expires_v2] " +
    "           ) " +
    "     VALUES " +
    "           (@name " +
    "           ,@email " +
    "           ,@country " +
    "           ,' ' " +
    "           ,@date " +
    "           ,@key_v0 " +
    "           ,@key_v1 " +
    "           ,@expires_v1 " +
    "           ,@expires_v2 " +
    "           )";

However, if I use the following sql, the entire loop is executed in under a second.

string sql =
    "INSERT INTO Test " +
    "([name] " +
    ",[email] " +
    ",[country] " +
    ",[comment] " +
    ",[date] " +
    ",[key_v0] " +
    ",[key_v1] " +
    ",[expires_v1] " +
    ",[expires_v2] " +
    ") " +
    "VALUES " +
    "(@name " +
    ",@email " +
    ",@country " +
    ",' ' " +
    ",@date " +
    ",@key_v0 " +
    ",@key_v1 " +
    ",@expires_v1 " +
    ",@expires_v2 " +
    ")";

The only difference is the whitespace. Removing the whitespace brought the total character count from 428 to 203. I have been unable to locate anything referencing the length of CommandText except references to 4k and 8k limits. I'm nowhere near that.

I have run both versions with the profiler running and it the duration is under 10ms for all calls. The delay seems to be happening from when the command is complete inside the SQL engine until ExecuteNonQuery returns.

I know that there are alternative ways to do this. I'm not asking about better ways to do this. I'm asking about the source of the slowdown.

UPDATE: As a test, I added spaces to the end of the query. As soon as I got above 400ish characters total, it slowed down. Interestingly, at 414 characters, the first 99 inserts are fast. At 415 characters, the first 9 inserts are fast. Since I am changing some of the strings based on the iteration number, this kinda makes sense. e.g. The 10th insert is a little longer than the 9th and the 100th insert is a little longer than the 99th.

While I kinda understand that longer inserts should take longer, I can't understand the clear division between fast and slow and the sheer magnitude of the difference. I also don't understand why the time taken increases.

UPDATE 2: (Additional information in response to Peter Oehlert's answer): The entire database is clean. There are no other tables and the test table is dropped and recreated for each run. There are no indexes, triggers or foreign keys. There is an 'id' column that is a primary key.

This is code pulled from a console app written specifically to troubleshoot this problem. It only includes the necessary code to repro this behavior.

(Aditional profiler information): When running the SQL profiler, there is a column called TextData that shows what the command and data are. An example is this:

exec sp_executesql N'INSERT INTO Test ([name] ,[email] ,[country] ,[comment] ,[date] ,[key_v0] ,[key_v1] ,[expires_v1] ,[expires_v2] ) VALUES (@name ,@email ,@country ,'' '' ,@date ,@key_v0 ,@key_v1 ,@expires_v1 ,@expires_v2 )                                                                                                                                                                                                                          ',N'@name nvarchar(4),@country nvarchar(2),@email nvarchar(3),@date datetime,@key_v0 nvarchar(4000),@key_v1 nvarchar(4000),@expires_v1 datetime,@expires_v2 datetime',@name=N'9Bob',@country=N'us',@email=N'Jim',@date='2009-08-28 15:35:36.5770000',@key_v0=N'',@key_v1=N'',@expires_v1='2009-08-28 15:35:36.5770000',@expires_v2='2009-08-28 15:35:36.5770000'

That line is 796 characters long and runs fast. Changing the name from '9Bob' to '10Bob' results in a slow insert. Neither 796, nor 797 seem like a significant numbers. Removing the exec sp_executesql portion means lengths of 777 and 778. They don't seem significant either.

I'm stumped.

Update: Posted trace here: http://www.jere.us/WierdInserts.trc

+2  A: 

I think part of the performance hit is in the clearing and adding of parameters (along with ineffecient string manipulation). What happens if you change up the structure a little bit like this?

SqlCommand cmd = new SqlCommand();
cmd.Connection = conn;
cmd.CommandText = sql;
cmd.Parameters.Add(new SqlParameter("@name", ""));
cmd.Parameters.Add(new SqlParameter("@country", ""));
// etc..


for (int i=0; i<1000; ++i)
{
          // etc...
    cmd.Parameters["@name"].Value =  i.ToString() + "Bob" + i.ToString();
    cmd.Parameters["@country"].Value =  i.ToString() + "Uganda" + i.ToString();
    DateTime cmdStart = DateTime.Now;
    cmd.ExecuteNonQuery();
    DateTime cmdEnd = DateTime.Now;
    TimeSpan len = cmdEnd - cmdStart;
}

Added

I was also looking at the code wrong, and I just now realized that this is a CommandType.Text, isn't it?

Is setting up a true stored procedure on the server, and then calling this by specifying CommandType.StoredProcedure and passing the Stored Procedure name instead of an QL statement an option for you? I realize I'm not answering your base quesiton, but I really don't think the length of the CommandText matters to SQL Server so I'm looking at other possible performance barriers.

This is a shot in the dark, and I'm hoping someone with a working knowledge on the internals of how the CommandObject parses text SQL statements with parameters can verify this, but I'm guessing that what happens is that the Command object has to parse the CommandText when it calls ExecuteNonQuery() (or ExecuteScalar(), etc).

String manipulation is expensive, and if you're forcing the command object to re-parse the parameters every time, this could be adding in come time as well.

Add to this the fact that true stored procedures generally perform better due to compiled execution plans, and you may see some improvements.

David Stratton
I did originally try that but the only thing that changed the peformance from "5 records and fail" to "completely done in less than a second" was removing the whitespace. Also, I'm only measuring the time around the ExecuteNonQuery call.
Jere.Jones
The original idea was to use this code only once to move some data from a MySql db to an MSSql db. Hence the lack of a stored proc. Now I'm just trying to find out what the heck is going on.
Jere.Jones
No, the command object **never** parses the commandtext. It lets sql server do it. That's kind of the point of parameterized queries is that this doesn't have to happen.
Joel Coehoorn
A: 

SQL Server does parse your query before it executes, simpler to parse and less unwanted code = faster parsing. It will have to strip every non additional white space before parsing, however every smaller cpu cycle counts when you do multiple operations.

You can verify the test by increasing number of spaces twice and you should see bigger slow down, since parsing character by character processing, its directly proportional to number of characters. And parsing is first step, even if its parametric query, SQL still needs to parse and understand whats the request.

Akash Kava
True, it will take longer. But we're still talking milliseconds. I don't think query parsing is the problem here.
Joel Coehoorn
Thats the only point I can think of, rest your query is quite simple, you can try increasing number of spaces, if there is no change then we can think of something else.
Akash Kava
Queries will not time out because of parsing, that's sure
Remus Rusanu
I agree that I don't think it's the parser. SQL Server can parse out unwanted text very very fast. It's true it's less optimal but that's not going to be responsable for an O(N^2) slow down as described (.5s, 1.1s, 3.3s).
Peter Oehlert
The question was asked as how extra white spaces can make difference, and no other extra code is specified, we cant shoot in dark, lets consider I am wrong, fine whats the plan b ? I get -1 but look at everyone talking about jit optimization etc, look at the code, CommandText is even outside for loop, its one time operation, profiler shows its the time its spent inside sql server, let there be more evidences rather then shooting in dark !!
Akash Kava
@Akash - If there is more evidence I can provide, I'm all for it. I'm not sure what else I can give.
Jere.Jones
I read your update and yes strings are cause, string operations takes longer, however if your sql machine is big enough and dedicated on hosting environment this should not make difference otherwise its also possible that SQL may be heavily loaded.
Akash Kava
+2  A: 

Strings in .Net are immutable. This has many benefits, but one drawback is that to concatenate two of them together you have to allocated a buffer for a whole new third string rather than just extending the buffer for the first string. The code you showed there has 21 separate concatenation operations, which has the potential to be slow. Normally I would expect the jit-optimizer to take care of this problem for you, but it's possible it's somehow missing this. You might try declaring the variable as static readonly and see if that helps any.

Even so, I would only expect this to make at most a few milliseconds difference. That's hardly going to make or break your query. The best suggestion I can give you is to take both versions of your query and paste them into different management studio windows, along with DECLARE and SET statements for each of your parameters, and compare the execution plans.

Finally, David Stratton's advice about re-using the same parameters is sound. There's no point in clearing and re-creating the same parameters every round when you can simply update values and re-run the same query.

Joel Coehoorn
+2  A: 

Well, I don't have a direct answer but here is how I would approach the problem.

  1. Identify who is causing the problem. The first step would be to run SQL profiler and see if the database is at issue or it's something in the code.

SQL

If it's the database then I would look at a few things: everyone is harping on string concatenation issues which while true will likely make up less than 5 ms of your timing. I also would discount the spaces as being the source of the issue. Again it will make a small difference but will not account for the degredation that you describe. You're looking for something that will have this progression (0.5, 1.1, 3.3).

I would look in particular at what indexes you've defined on this table, what constraints/ triggers are on this table, and how many foreign key relationships are present. Additionally I would pull out the queries that are executing slowly and run them in query manager (sql enterprise manager).

The final thing I might investigate is whether there is a bad cache plan introducing issues with some data dependant feature. This would only be valid if you had some interesting triggers that use a piece of your data or certain types of index updates. You can look at this by calling DBCC FREEPROCCACHE in between calls to your insert statement and see if it makes a difference. This command will clear the plan cache forcing sql to regenerate a new execution plan for your query.

Client

If it is the client then you need to determine what in your code is causing the issue. If you have a performance tracing tool (like visual studio performance analyzer) which will instrument your code I would use that as it will VERY quickly tell you precisely what is taking up so much time.

If you don't have that option then start by pulling out your code into a new console app with the smallest possible set of contingencies and see if you can repro the behavior. You're looking for whatever might be the cause of the progression you're seeing.

Peter Oehlert
Good anwser. (+1) I sometimes just look for the answer in code rather than investigating. It saves time when the answer is obvious, but wastes time looking in the wrong area sometimes. I like your approach of systematically investigating where the problem is. Kudos!
David Stratton
+2  A: 

If 10Bob is much slower than both 9Bob and 99Bob, that would potentially point to an INDEX on Name where either the FILLFACTOR is set too high, or SQL Server is having to reindex the page when it hits the "1" in "10Bob".

So, that explains Bob, except that you say there is no index and whitespace also makes a difference...

768 bytes is an important boundary in MySQL for deciding whether a BLOB is stored inline or in a separate table. Perhaps the SQL query optimizer has a similar boundary?

That might explain a tiny performance difference, but doesn't explain the order of magnitude.

SQL Server uses 8k pages by default, so one might expect a tiny performance hit when the first INSERT comes along that requires a new page, but again, nothing to do with whitespace and doesn't explain the amount of delay here.

richardtallent
99Bob and 10Bob have the same performance characteristics.
Jere.Jones
+1  A: 

Your trace has all inserts at duration 0-3 ms. There are hoewer big times between the executions: an insert ends at 12:53:10, the next starts at 12:53:13, so there is a 3 seconds delay in the client between the two INSERTs. Tehcnically the delay can be anywhere between the client and the server, but from the symptomps you describe I would rule out a random flacky router between client and the server (behavior would be more random).

Some place I would look at:

  • database growth events/log growth event. This is very common with tests, because test setup deploys a brand new test db and then the test hit the growth event at about the same moment (say the 10th insert). Can be easily verified with perfromance counters and profiler events: Data File Auto Grow Event Class. Solution is to pre-grow the test db (actually you should always pre-grow both mdf and ldf for perf testing). Would not account for an geometric rate of time increase though.
  • Garbage Collector in client. Again, can be tracked in performance counters.
  • Connection pool exhaustion (ie. the test leaks connections, the pool has to open new ones and is set to keep a min count, so it opnes in batches). sys.dm_exec_connecitons would grow. Also there is perfmon counter for user sessions (both on server and on ADO.Net counters).
  • code defect. Some wait/processing in the client code, perhaps a list processing. This is the most likely cause, is the only one that explains the square rate of delay increase (NxN for an N length list, N increases with each test run, perhaps a sort, or a test result save, or something similar).
Remus Rusanu
I love your answer. You see exactly what I see. I would like it to be code defect but you see all of the code above. The loop is just as described. No sorting. No saving. Just insert-loop-insert-etc.
Jere.Jones