views:

1526

answers:

7

Single-threaded version description:

  1. Program gathers a list of questions.
  2. For each question, get model answers, and run each one through a scoring module.


  • Scoring module makes a number of (read-only) database queries.
  • Serial processing, single database connection.

I decided to multi-thread the above described program by splitting the question list into chunks and creating a thread for each one.

Each thread opens it's own database connection and works on it's own list of questions (about 95 questions on each of 6 threads). The application waits for all threads to finish, then aggregates the results for display.

To my surprise, the multi-threaded version ran in approximately the same time, taking about 16 seconds instead of 17.

Questions:

Why am I not seeing the kind of gain in performance I would expect from executing queries concurrently on separate threads with separate connections? Machine has 8 processors.

Will SQL Server process queries concurrently when they are coming from a single application, or might it (or .net itself) be serializing them?

Might there be something misconfigured, that would make it go faster, or might I just be pushing SQL Server to its computational limits?

Current configuration:

Microsoft SQL Server Developer Edition 9.0.1406 RTM
OS: Windows Server 2003 Standard
Processors: 8
RAM: 4GB

A: 

How large is your database? How fast are your HDDs / Raid / Other storage

Perhaps your DB is I/O bound?

Tom Leys
The DB is a few GB. There are 5 drives, (four 73GB and one 300GB), Serial Attached SCSI, 3, 15K, 3.5, SG2, 15K5. The raid is a mystery honestly (haven't restarted server to check), and I wasn't here when it was set up. They show up as a single 135GB drive in windows. Is there a way to determine if the DB is I/O bound?
Triynko
A: 

My first inclination is that you're trying to solve an IO problem with threads, which almost never works. IO is IO, and more threads doesn't increase the pipe. You'd be better off downloading all questions and their answers in one batch and processing the batch locally with multiple threads.

Having said that, you're probably experiencing some db locking that is causing slowness. Since you're talking about read-only queries, try using the with (nolock) hint on your queries to see if that helps.

Regarding SQL server processing, it is my understanding that SQL Server will try to process as many connections concurrently as possible (one statement at a time per connection), up to the max connections allowed by configuration. The kind if issue you're seeing is almost never a thread issue and almost always a locking or IO problem.

Chris
The threads were introduced specifically to allow processing while database transactions are going on and to attempt to ensure queries are running in parallel to take advantage of the multiple processors, because without multi-threading, each question would block waiting for database activity. Threads definitely help in that area, but the IO is indeed a separate factor among others.
Triynko
While using nolock may boost overall query performance (for both serial and parallel runtime), I'm not interested in that; I'm interested in why the parallel version isn't running a lot faster. The queries involve accessing 'tree' structures, so I can't risk reading uncommitted rows with nolock, since other utilities may modify the scoring trees at any time.
Triynko
The point of nolock is to avoid locking operations that may be delaying your queries. If query1 is accessing 15 records, it may lock several more than that. If query2 attempts to access any of the nearby locked records, it won't be able to run in parallel - it has to wait for the lock to free up.
Chris
Read-only queries shouldn't interfere with each other by default; a read-lock shouldn't prevent another query from reading the same rows, only writing to them. Avoiding the read-lock altogether with 'nolock' does reduce the overall running time by a couple seconds, but that's still not the kind of performance gain I'm expecting for parallelizing things on an 8 processor machine. Perhaps this is just an I/O problem.
Triynko
A: 

is it possible that the the threads share a connection? did you verify that multiple SPIDs are created when this runs (sp_who)?

KM
Each thread calls a method which constructs a new SqlConnection object and opens it; so unless the framework is doing something goofy with connection pooling (maybe it is?), then the threads do not share a connection.
Triynko
@Triynko, unless you check, you can't be sure what is happening. run sp_who, then run your program, and while it is running run sp_who again, you can then see if you have 4 processes running in the database.
KM
@KM, I ran sp_who before/during execution, and new spids equal to the number of running threads appear. Each spid has 9 entries, with ecids of 0 through 8. They all are all 'select' commands. So it appears that there are indeed multiple simultaneous connections being made as expected.
Triynko
@Triynko, so you can now concentrate on the database, you have 4 "users" hitting the same tables at the same time. you can use traditional DB techniques to make sure they are not slowing down each other.
KM
+2  A: 

This is just a shot in the dark, bu I bet you are not seeing the performance gain because they serialize themselves int he database due to locking of shared resources (records). Now for the small print.

I assume your C# code is actually correct and you actually do start separate threads and issue each query in parallel .No offense, but I've seen many making that claim and the code being actually serial in the client, for various reasons. You should validate this by monitoring the server (via Profiler, or use the sys.dm_exec_requests and sys.dm_exec_sessions).

Also I assume that your queries are of similar weight. Ie. you do not have one thread that lasts 15 seconds and 5 that 100 ms.

The symptoms you describe, in lack of more details, would point that you have a write operation at the begging of each thread that takes an X lock on some resource. First thread starts and locks the resource, other 5 wait. 1st thread is done, releases the resource then the next one grabs it, other 4 wait. So last thread has to wait for the execution of all other 5. This would be extremely easy to troubleshoot by looking at sys.dm_exec_requests and monitor what blocks the requests.

BTW you should consider using Asynchronous Processing=true and rely on the async methods like BeginExecuteReader to launch your commands in execution in parallel w/o the overhead of client side threads.

Remus Rusanu
I'm sure the code is correct. Each of the threads is sent an initialization object that contains a unique connection object along with a unique list of items to process. The loop that starts the threads completes in 5.4875 ms. There are no write operations -- only select queries. The queries are of similar weights, as demonstrated by the similar running times of each thread (16.639s, 16.481s, 16.544, 16.712). Total running time of multithreaded application is approx 16s, compared to 17s for the single-threaded version.
Triynko
So, really, I've already thought about all that. What I don't understand is why SQL Server is taking the same amount of time to process hundreds of queries, regardless of whether they're sent in parallel or not. Indeed, some sort of serialization seems to be taking place within SQL Server, but why? With read-only queries, do I seriously have to specify 'nolock', as some have suggested? SQL Server should be able to handle read-only (select) queries concurrently without the 'nolock' syntax.
Triynko
With the 'nolock' option specified for all query strings (3 total), the running times for each thread drops by about 2 seconds (14.398, 14.213, 14.212, 14.475). The single-threaded running time drops as well though, still just a second behind the multi-threaded version. I ran a trace in profiler, and there are just under 5000 queries execute, all of which have a zero for the 'Writes' column.
Triynko
You're not going to discover the answer by guessing. Use the sys.dm_exec_requests view to see what are your queries doing.
Remus Rusanu
I already ran dm_exec_requests, and I'm not guessing. The trace shows the start and end times of all queries, and a graph on all 5000 values shows that they are executing concurrently (i.e. start/end segments clearly overlap for different spids).
Triynko
It's like, SQL Server indicates that it's executing requests concurrently, but the timing doesn't back it up. It leads me to believe that it's not actually using multiple processors for concurrent requests...
Triynko
...of course, that doesn't fit either, because task manager shows over 90% CPU usage, and all 8 graphs for all 8 processors max out for the 14 seconds that it runs. Am I just pushing the server to it's computational limit or what?
Triynko
dm_exec_requests by the way shows simultaneous requests. I run it over and over, and they parallel requests come and go, but I usually catch all four of them in a suspended state with a wait type of CXPACKET. Does that give any ideas?
Triynko
I thought about using the BeginExecuteReader methods, but the scoring algorithm requires synchronous completion to continue, and uses such API calls as 'List<Response> responses = Query.Responses.Load( conn, userid, questionid );' Interestingly, I read an article in Game Developer recently which suggests sticking to using a synchronous API, which under the hood uses fibers and asynchronous calls. It lets you write your code cleanly in a single-threaded fashion, with the benefits of concurrency, but it requires careful attention to making sure fiber switches occur frequently enough.
Triynko
the point of the dm_exec_requests was to check the state of the request: blocked or runnable. If they are blocked, what are the blocked on? What is the 'last wait resource' and the 'last wait resource type'.
Remus Rusanu
And also who is the blocking spid and what is it doing.
Remus Rusanu
CXPACKET means your query is split into parallel sub-queries and the driving thread is waiting for the sub-queries to produce data. A parallel plan makes sense only if the queries have lots of data to scan, so this would point toward some poorly designed queries and indexes. This could also possibly explain the timing, as your queries are by no means CPU bound but IO bound. Executing them in parallel results in the same time, because it takes just as much to fetch the needed pages in the buffer pool.
Remus Rusanu
The queries are optimal, as are the indexes. For example, the 2nd of the three queries is a simple select statement with a single integer field in the where clause, which has a clustered index on it. The first query runs up-front on a table with less than a hundred rows, before any of the threads/timers start, so it's irrelevant. The only other query runs on a table with half a million rows to get the lone model response for a question; it has 2 fields in the where clause, which have a single multi-column clustered index on those fields, specified in the same order.
Triynko
The requests were 'suspended', but not blocked or runnable. There is no blocking spid, and any field relating to blocking is zero. The wait type (current and last), as I said was CXPACKET. There is no 'last_wait_resource' field, but the 'wait_resource' field is always blank. The wait times are between 0 and 125, but usually less than 50.
Triynko
you need to look at the threads executing the parallel plan if they are blocked or suspended and why. Check sys.dm_os_workers ad sys.dm_os_tasks
Remus Rusanu
I don't see any column in those views labeled "explaination why status is suspended", lol, but I just posted a response with the results. I also took a look at the query plan predictions in management studio, and it indicates that both of the main queries are simple clustered index seeks.
Triynko
A: 

I ran a join query across sys.dm_os_workers, sys.dm_os_tasks, and sys.dm_exec_requests on task_address, and here are the results (some uninteresting/zero-valued fields excluded, others prefixed with ex or os to resolve ambiguities):

-COL_NAME-  -Thread_1- -Thread_2- -Thread_3- -Thread_4-

task_state  SUSPENDED SUSPENDED SUSPENDED SUSPENDED
context_switches_count  2 2 2 2
worker_address  0x3F87A0E8 0x5993E0E8 0x496C00E8 0x366FA0E8
is_in_polling_io_completion_routine 0 0 0 0
pending_io_count    0 0 0 0
pending_io_byte_count   0 0 0 0
pending_io_byte_average 0 0 0 0
wait_started_ms_ticks   1926478171 1926478187 1926478171 1926478187
wait_resumed_ms_ticks   1926478171 1926478187 1926478171 1926478187
task_bound_ms_ticks 1926478171 1926478171 1926478156 1926478171
worker_created_ms_ticks 1926137937 1923739218 1921736640 1926137890
locale  1033 1033 1033 1033
affinity    1 4 8 32
state   SUSPENDED SUSPENDED SUSPENDED SUSPENDED
start_quantum   3074730327955210 3074730349757920 3074730321989030 3074730355017750
end_quantum 3074730334339210 3074730356141920 3074730328373030 3074730361401750
quantum_used    6725 11177 11336 6284
max_quantum 4 15 5 20
boost_count 999 999 999 999
tasks_processed_count   765 1939 1424 314
os.task_address 0x006E8A78 0x00AF12E8 0x00B84C58 0x00D2CB68
memory_object_address   0x3F87A040 0x5993E040 0x496C0040 0x366FA040
thread_address  0x7FF08E38 0x7FF8CE38 0x7FF0FE38 0x7FF92E38
signal_worker_address   0x4D7DC0E8 0x571360E8 0x2F8560E8 0x4A9B40E8
scheduler_address   0x006EC040 0x00AF4040 0x00B88040 0x00E40040
os.request_id   0 0 0 0
start_time  2009-05-26 19:39 39:43.2 39:43.2 39:43.2
ex.status   suspended suspended suspended suspended
command SELECT SELECT SELECT SELECT
sql_handle  0x020000009355F1004BDC90A51664F9174D245A966E276C61 0x020000009355F1004D8095D234D39F77117E1BBBF8108B26 0x020000009355F100FC902C84A97133874FBE4CA6614C80E5 0x020000009355F100FC902C84A97133874FBE4CA6614C80E5
statement_start_offset  94 94 94 94
statement_end_offset    -1 -1 -1 -1
plan_handle 0x060007009355F100B821C414000000000000000000000000 0x060007009355F100B8811331000000000000000000000000 0x060007009355F100B801B259000000000000000000000000 0x060007009355F100B801B259000000000000000000000000
database_id 7 7 7 7
user_id 1 1 1 1
connection_id   BABF5455-409B-4F4C-9BA5-B53B35B11062 A2BBCACF-D227-466A-AB08-6EBB56F34FF2 D330EDFE-D49B-4148-B7C5-8D26FE276D30 649F0EC5-CB97-4B37-8D4E-85761847B403
blocking_session_id 0 0 0 0
wait_type   CXPACKET CXPACKET CXPACKET CXPACKET
wait_time   46 31 46 31
ex.last_wait_type   CXPACKET CXPACKET CXPACKET CXPACKET
wait_resource      
open_transaction_count  0 0 0 0
open_resultset_count    1 1 1 1
transaction_id  3052202 3052211 3052196 3052216
context_info    0x 0x 0x 0x
percent_complete    0 0 0 0
estimated_completion_time   0 0 0 0
cpu_time    0 0 0 0
total_elapsed_time  54 41 65 39
reads   0 0 0 0
writes  0 0 0 0
logical_reads   78745 123090 78672 111966
text_size   2147483647 2147483647 2147483647 2147483647
arithabort  0 0 0 0
transaction_isolation_level 2 2 2 2
lock_timeout    -1 -1 -1 -1
deadlock_priority   0 0 0 0
row_count   6 0 1 1
prev_error  0 0 0 0
nest_level  2 2 2 2
granted_query_memory    512 512 512 512

The query plan predictor for all queries shows a couple nodes, 0% for select, and 100% for a clustered index seek.

Edit: The fields and values I left out where (same for all 4 threads, except for context_switch_count): exec_context_id(0), host_address(0x00000000), status(0), is_preemptive(0), is_fiber(0), is_sick(0), is_in_cc_exception(0), is_fatal_exception(0), is_inside_catch(0), context_switch_count(3-89078), exception_num(0), exception_Severity(0), exception_address(0x00000000), return_code(0), fiber_address(NULL), language(us_english), date_format(mdy), date_first(7), quoted_identifier(1), ansi_defaults(0), ansi_warnings(1), ansi_padding(1), ansi_nulls(1), concat_null_yields_null(1), executing_managed_code(0)

Triynko
You did not include the threads started by the parallel plan
Remus Rusanu
The other threads where all there before my four threads appeared, and they all have a status of 'background'. So there were 10 threads before my application started, and 14 threads while it was running. The original 10 threads had the following commands: RESOURCE MONITOR, LAZY WRITER, LOCK MONITOR, SIGNAL HANDLER, TRACE, QUEUE TASK, BRKR TASK, TASK MANAGER, BRKR EVENT HNDLR, BRKR TASK. My 4 threads all had the command: SELECT.
Triynko
A: 

You can simply check the task manager when the process is running. If it's showing 100% CPU usage then its CPU bound. Otherwise its IO Bound.

For hyperthreading 50% CPU usage is roughly equal to 100% usage!

Wow I didn't realize how old the thread was. I guess its always good to leave the response for others looking.

WebDev01
A: 

This is an interesting thread. Did you ever manage to resolve this, Triynko? I love how everyone thinks it is a problem with you :)

electrichead
Not really. It's kind of a mystery. After rereading this post (it's been a year, haha), I think the logical conclusion is that I was just maxing out the machine in terms of work-volume. I was probably maxing out the processors and/or the disk I/O, but I'm not sure which one. In any case, multi-threading could assist a processor bottleneck, but that didn't seem to help... so it's probably not an avoidable bottleneck. The I/O problem could be helped by shrinking the size of database tables, but it's already minimal (trimmed varchar fields, etc.).
Triynko