In a rather broad and general way of describing things (ie, not specific to .net or any other platform), services are programmed in one of two ways, or a combination of both.
thread per connection: as you note, this uses a single scheduling entity, such as a kernel thread or process, or possibly a lighter thread or co-routine implemented by the platform in user space, such that each process need only concern itself with handling a single request, in order.
example pseudocode:
function run(connection)
while(connection is open)
frobnicate(connection)
function main
listener = new Listener(port)
threads = new Collection<Thread>()
while(running)
connection = listener.accept()
threads.add(new Thread(run, connection))
join(threads)
exit()
Important features of the above sample:
- this follows the thread per connection model, exactly as you put it. Once the thread
has finished working on the connection, it dies.
- If this program were to run in the real world, it would likely be alright up until it
had to handle more than a few hundred threads. Most platforms start to churn at about
this level. A few are specially designed to handle many more threads, in the tens of
thousands of simultaneous connections.
- Although the connections are handled in a manner that is multithreaded, actually
accepting incoming connections is handled only in the main thread, this can possibly
be a point of DoS, if a malicious host tries to connect but deliberately fails, slowly.
- When the program detects that it is time to stop running, the main thread joins the other
threads so that they do not die when it exits. again, that means that the program won't
exit until all old threads are finished handling data, which might just be never.
An alternative would be to not use the join, in which case the other threads would be
terminated uncerimiously.
non-blocking: a single thread of execution manages several connections, and when any given connection is stalled, because it is waiting for data to move across the network, or to be read in from disk, the process skips over it and works on another connection.
example pseudocode:
function main
connections = new Collection<Connection>()
listener = new Listener()
connections.append(listener)
foreach connection in connections:
if connection.ready():
if connection is listener:
connections.add(connection.accept() )
else:
if connection is open:
nb_frobnicate(connection)
else:
connections.remove(connection)
yield()
if( not running )
exit()
Features of this snippet:
- In the multi-threaded version, each thread was handling only a single connection. if
that connection isn't ready to be used, the thread blocks, and another thread can do
work. if
frobnicate
blocked in this implementation, it would be a disaster. other
connections might be able to do work, even if one connection happens not to be ready.
To address this, an alternative function is used, which takes care to only use
non-blocking operations on the connection. these reads and writes return immediately,
and return with values that tell the caller how much work they were able to do. if that
happens happens to be no work at all, the program will just have to try again as soon
as it knows that the connection is ready for more work. I've changed the name to
nb_frobnicate
to indicate that this function is being careful to be non-blocking.
- Most platforms supply a function that can abstract out the loop over the connections,
and checking if each one has data. This is going to be called either
select()
or
poll()
(both may be available). Even so, I've chosen to show it this way, because
you might want to work on more than just a network connection. As far as I know, there
is no way to wait on every possible blocking operation generically, If you also need to
wait on disk IO (windows only), timers, hardware interrupts (like screen refresh) you'll
have to manually poll different event sources, kind of like this.
- at the far end of the loop is a
yield()
function, which sleeps until an IO interrupt
happens. This is necessary in case no IO is ready to be handled. If we didn't do this,
the program would busy-wait, checking and re-checking each port, and finding that there
is no data ready, needlessly wasting CPU. In case the program only needs to wait on
network IO (or also disk on posix systems), then select/poll
could be invoked in
blocking mode, and it would sleep until a network connection would be ready. If you had
to wait on more kinds of events, though, you'd have to use a non-blocking version of
those functions, poll other event sources according to their idiom, and then yield once
all would block.
- There's no equivalent here to the thread join from the previous example. This is instead
analogous to the non-joining behavior. you could probably get similar behavior by
removing the listener from the polling collection, and then only calling exit once
the collection is empty.
- importantly, the entire program is composed of simple loops and function calls. there
is no point that incurs the cost of a context switch. The down-side of this is that
the program is strictly sequential, and cannot take advantage of multiple cores, if they
are present. In practice this is seldom an issue, because most services will be IO
bound, limited by the rate at which connections become ready, not the cost of the
fictional
frobnicate()
function.
As I mentioned, these can both be used in combination, allowing for many of the advantages of multiple scheduling entities, such as better CPU utilization on multi-core systems, as well as the advantages of non-blocking, ie lower load due to fewer context switches.
example of this:
function worker(queue)
connections = Collection<Connection>()
while(running)
while(queue not empty)
connections.add(queue.pop)
foreach connection in select(connections)
if connection is open:
nb_frobnicate(connection)
else:
connections.remove(connection)
function main
pool = new Collection<Thread, Queue<Connection> >()
for index in num_cpus:
pool[index] = new Thread(worker), new Queue<Connection>
listener = new Listener(port)
while(running)
connection = listener.accept()
selected_thread = 0
for index in pool.length:
if pool[index].load() < pool[selected_thread].load()
selected_thread = index
pool[selected_thread].push(connection)
pool[selected_thread].wake()
notes about this program:
- This creates a bunch of instances of the single threaded version, but since they are
multithreaded, they have to communicate with each other, somehow. this is done using
the
Queue<Connection>
for each thread.
- Also notice that it uses the
nb_frobnicate
handler for the same reason as the single threaded program, and for the same reason,
because each thread handles more than one connection.
- The pool of worker threads is limited according to the number of processors we have.
since there would be little benefit of having more threads waiting than can possibly
be running at once. In practice, the optimum number of threads to use may vary from
application to application, but the number of cpu's is often a good enough guess.
- As before, the main thread accepts connections, and passes them to worker threads.
If the thread is already done with work on other threads, waiting for them to be ready,
then it will just sit there, even if the new connection is ready now. To alleviate this,
the main thread then notifies the worker thread by waking it, causing any blocking
operation to return. If this had been a normal read, it would probably return an error
code, but since we're using a
select()
, it would just return an empty list of ready
connections, and the thread can empty its queue once more.
Edit: added code samples: