



I am in the design phase of writing a new Windows Service application that accepts TCP/IP connections for long running connections (i.e. this is not like HTTP where there are many short connections, but rather a client connects and stays connected for hours or days or even weeks).

I'm looking for ideas for the best way to design the network architecture. I'm going to need to start at least one thread for the service. I am considering using the Asynch API (BeginRecieve, etc..) since I don't know how many clients I will have connected at any given time (possibly hundreds). I definitely do not want to start a thread for each connection.

Data will primarily flow out to the clients from my server, but there will be some commands sent from the clients on occasion. This is primarily a monitoring applicaiton in which my server sends status data periodically to the clients.

Any suggestions on the best way to make this as scalable as possible? Basic workflow? Thanks.

EDIT: To be clear, i'm looking for .net based solutions (C# if possible, but any .net language will work)

BOUNTY NOTE: To be awarded the bounty, I expect more than a simple answer. I would need a working example of a solution, either as a pointer to something I could download or a short example in-line. And it must be .net and Windows based (any .net language is acceptable)

EDIT: I want to thank everyone that gave good answers. Unfortunately, I could only accept one, and I chose to accept the more well known Begin/End method. Esac's solution may well be better, but it's still new enough that I don't know for sure how it will work out.

I have upvoted all the answers I thought were good, I wish I could do more for you guys. Thanks again.


You could try using a framework called ACE (Adaptive Communications Environment) which is a generic C++ framework for network servers. It's a very solid, mature product and is designed to support high-reliability, high-volume applications up to telco-grade.

The framework deals with quite a wide range of concurrency models and probably has one suitable for your applciation out of the box. This should make the system easier to debug as most of the nasty concurrency issues have already been sorted out. The trade-off here is that the framework is written in C++ and is not the most warm and fluffy of code bases. On the other hand, you get tested, industrial grade network infrastructure and a highly scalable architecture out of the box.

That is a good suggestion, but from the tags of the question I believe the OP will be using C#
I noticed that; the suggestion was that this is available for C++ and I'm not aware of anything equivalent for C#. Debugging this sort of system isn't easy at the best of times and you may get a return from going to this framework even though it means switching to C++.
Yes, this is C#. I'm looking for good .net based solutions. I should have been more clear, but I assumed people would read the tags
Mystere Man
+1  A: 

I would use SEDA or a lightweight threading library (erlang or newer linux see NTPL scalability on the server side). Async coding is very cumbersome if your communication isn't :)

+1  A: 

Well, .NET sockets seem to provide select() - that's best for handling input. For output I'd have a pool of socket-writer threads listening on a work queue, accepting socket descriptor/object as part of the work item, so you don't need a thread per socket.

Nikolai N Fetissov

To be clear, i'm looking for .net based solutions (C# if possible, but any .net language will work)

You are not going to get the highest level of scalability if you go purely with .NET. GC pauses can hamper the latency.

I'm going to need to start at least one thread for the service. I am considering using the Asynch API (BeginRecieve, etc..) since I don't know how many clients I will have connected at any given time (possibly hundreds). I definitely do not want to start a thread for each connection.

Overlapped IO is generally considered to be Windows' fastest API for network communication. I don't know if this the same as your Asynch API. Do not use select as each call needs to check every socket that is open instead of having callbacks on active sockets.

I do not understand your GC pause comment.. I have never seen a system with scalability problems that was directly related to GC.
@markt then you haven't been making very large systems.
@markt for example see
It is far more likely that you build an app that can't scale because of poor architecture than because GC exists. Huge scalable+ performant systems have been built with both .NET and Java. In both of the links that you gave, the cause was not directly garbage collection.. but related to heap swapping. I would suspect that it is really a problem with architecture that could have been avoided.. If you can show me a language that it is not possible to build a system that cannot scale, I will gladly use it ;)
I disagree with this comment. Unknown, the questions you reference are Java, and they are specifically dealing with larger memory allocations and trying to manually force gc. I'm not really going to be having huge amounts of memory allocation going on here. This is just not an issue. But thanks. Yes, the Asynchronous Programming Model is typically implemented on top of Overlapped IO.
Mystere Man
@Mystere Man, well go ahead and disagree, but the fact is that if you aren't running the GC's manual collect, it will come back and bite you. Most of the time it won't be lagging because .neT will try not to collect until it hits an internal heap threshold. Then when you do eventually need to clean up, you will have one big mess.
@Mystere Man, in both the examples, I don't see any of them trying to force the GC. Also in one of them, they are using it as a JSON-RPC server, a similar endeavor as you I suspect.
Actually, best practice is not to be constantly manually forcing the GC to collect. This could very well make your app perform worse. The .NET GC is a generational GC that will tune to your app's usage. If you really think that you need to be manually calling GC.Collect, I would say that your code most likely needs to be written another way..
@markt, that is a comment for people who don't really know anything about garbage collection. If you have idle time, there is nothing wrong with doing a manual collection. Its not going to make your application worse when it finishes. Academic papers show that generational GCs work because its an approximation of the lifetime of your objects. Obviously this isn't a perfect representation. In fact, there is a paradox where the "oldest" generation often has the highest ratio of garbage because it is never garbage collected.
Anyway, there is no reason to continue hijacking this question for a GC discussion. We each have our opinions and experience..
+4  A: 

Using .NET's integrated Async IO (BeginRead, etc) is a good idea if you can get all the details right. When you properly set up your socket/file handles it will use the OS's underlying IOCP implementation, allowing your operations to complete without using any threads (or, in the worst case, using a thread that I believe comes from the kernel's IO thread pool instead of .NET's thread pool, which helps alleviate threadpool congestion.)

The main gotcha is to make sure that you open your sockets/files in non-blocking mode. Most of the default convenience functions (like File.OpenRead) don't do this, so you'll need to write your own.

One of the other main concerns is error handling - properly handling errors when writing asynchronous I/O code is much, much harder than doing it in synchronous code. It's also very easy to end up with race conditions and deadlocks even though you may not be using threads directly, so you need to be aware of this.

If possible, you should try and use a convenience library to ease the process of doing scalable asynchronous IO.

Microsoft's Concurrency Coordination Runtime is one example of a .NET library designed to ease the difficulty of doing this kind of programming. It looks great, but as I haven't used it, I can't comment on how well it would scale.

For my personal projects that need to do asynchronous network or disk I/O, I use a set of .NET concurrency/IO tools that I've built over the past year, called Squared.Task. It's inspired by libraries like imvu.task and twisted, and I've included some working examples in the repository that do network I/O. I also have used it in a few applications I've written - the largest publicly released one being NDexer (which uses it for threadless disk I/O). The library was written based on my experience with imvu.task and has a set of fairly comprehensive unit tests, so I strongly encourage you to try it out. If you have any issues with it, I'd be glad to offer you some assistance.

In my opinion, based on my experience using asynchronous/threadless IO instead of threads is a worthwhile endeavor on the .NET platform, as long as you're ready to deal with the learning curve. It allows you to avoid the scalability hassles imposed by the cost of Thread objects, and in many cases, you can completely avoid the use of locks and mutexes by making careful use of concurrency primitives like Futures/Promises.

Kevin Gadd
Great info, I'll check out your references and see what makes sense.
Mystere Man
+8  A: 

Have you considered just using a WCF net TCP binding and a publish/subscribe pattern ? WCF would allow you to focus [mostly] on your domain instead of plumbing..

There are lots of WCF samples & even a publish/subscribe framework available on IDesign's download section which may be useful :

+2  A: 

You can find a nice overview of techniques at the C10k problem page.

+9  A: 

I've got such a server running in some of my solutions. Here is a very detail explanation of the different ways to do it in .net: Get Closer to the Wire with High-Performance Sockets in .NET

Lately I've been looking for ways to improve our code and will be looking into this: "Socket Performance Enhancements in Version 3.5" that was included specifically "for use by applications that use asynchronous network I/O to achieve the highest performance".

"The main feature of these enhancements is the avoidance of the repeated allocation and synchronization of objects during high-volume asynchronous socket I/O. The Begin/End design pattern currently implemented by the Socket class for asynchronous socket I/O requires a System.IAsyncResult object be allocated for each asynchronous socket operation."

You can keep reading if you follow the link. I personally will be testing their sample code tomorrow to benchmark it against what i've got.

Edit: Here you can find working code for both client and server using the new 3.5 SocketAsyncEventArgs so you can test it within a couple minutes and go thru the code. It is a simple approach but is the basis for starting a much larger implementation. Also this article from almost two years ago in MSDN Magazine was a interesting read.

+20  A: 

I've written something similar to this in the past. From my research years ago showed that writing your own socket implementation was the best bet, using the Asynchronous sockets. This meant that clients not really doing anything actually required relatively little resources. Anything that does occur is handled by the .net thread pool.

I wrote it as a class that manages all connections for the servers.

I simply used a list to hold all the client connections, but if you need faster lookups for larger lists, you can write it however you want.

private List<xConnection> _sockets;

Also you need the socket actually listenning for incomming connections.

private System.Net.Sockets.Socket _serverSocket;

The start method actually starts the server socket and begins listening for any incomming connections.

public bool Start()
  System.Net.IPHostEntry localhost = System.Net.Dns.GetHostEntry(System.Net.Dns.GetHostName());
  System.Net.IPEndPoint serverEndPoint;
     serverEndPoint = new System.Net.IPEndPoint(localhost.AddressList[0], _port);
  catch (System.ArgumentOutOfRangeException e)
    throw new ArgumentOutOfRangeException("Port number entered would seem to be invalid, should be between 1024 and 65000", e);
    _serverSocket = new System.Net.Sockets.Socket(serverEndPoint.Address.AddressFamily, SocketType.Stream, ProtocolType.Tcp);
   catch (System.Net.Sockets.SocketException e)
      throw new ApplicationException("Could not create socket, check to make sure not duplicating port", e);
    catch (Exception e)
       throw new ApplicationException("Error occured while binding socket, check inner exception", e);
       //warning, only call this once, this is a bug in .net 2.0 that breaks if 
       // you're running multiple asynch accepts, this bug may be fixed, but
       // it was a major pain in the ass previously, so make sure there is only one
       //BeginAccept running
       _serverSocket.BeginAccept(new AsyncCallback(acceptCallback), _serverSocket);
    catch (Exception e)
       throw new ApplicationException("Error occured starting listeners, check inner exception", e);
    return true;

I'd just like to note the exception handling code looks bad, but the reason for it is I had exception suppression code in there so that any exceptions would be suppressed and return false if a config option was set, but I wanted to remove it for brevity sake.

The _serverSocket.BeginAccept(new AsyncCallback(acceptCallback)), _serverSocket) above essentially sets our server socket to call the acceptCallback method whenever a user connects. This method runs from the .Net threadpool, which automatically handles creating additional worker threads if you have many blocking operations. This should optimally handle any load on the server.

    private void acceptCallback(IAsyncResult result)
       xConnection conn = new xConnection();
         //Finish accepting the connection
         System.Net.Sockets.Socket s = (System.Net.Sockets.Socket)result.AsyncState;
         conn = new xConnection();
         conn.socket = s.EndAccept(result);
         conn.buffer = new byte[_bufferSize];
         lock (_sockets)
         //Queue recieving of data from the connection
         conn.socket.BeginReceive(conn.buffer, 0, conn.buffer.Length, SocketFlags.None, new AsyncCallback(ReceiveCallback), conn);
         //Queue the accept of the next incomming connection
         _serverSocket.BeginAccept(new AsyncCallback(acceptCallback), _serverSocket);
       catch (SocketException e)
         if (conn.socket != null)
           lock (_sockets)
         //Queue the next accept, think this should be here, stop attacks based on killing the waiting listeners
         _serverSocket.BeginAccept(new AsyncCallback(acceptCallback), _serverSocket);
       catch (Exception e)
         if (conn.socket != null)
           lock (_sockets)
         //Queue the next accept, think this should be here, stop attacks based on killing the waiting listeners
         _serverSocket.BeginAccept(new AsyncCallback(acceptCallback), _serverSocket);

The above code essentially just finished accepting the connection that comes in, queues BeginReceive which is a callback that will run when the client sends data, and then queues the next acceptCallback which will accept the next client connection that comes in.

The BeginReceive method call is what tells the socket what to do when it receives data from the client. For BeginReceive, you need to give it a byte array, which is where it will copy the data when the client sends data. The ReceiveCallback method will get called, which is how we handle receiving data.

private void ReceiveCallback(IAsyncResult result)
  //get our connection from the callback
  xConnection conn = (xConnection)result.AsyncState;
  //catch any errors, we'd better not have any
    //Grab our buffer and count the number of bytes receives
    int bytesRead = conn.socket.EndReceive(result);
    //make sure we've read something, if we haven't it supposadly means that the client disconnected
    if (bytesRead > 0)
      //put whatever you want to do when you receive data here

      //Queue the next receive
      conn.socket.BeginReceive(conn.buffer, 0, conn.buffer.Length, SocketFlags.None, new AsyncCallback(ReceiveCallback), conn);
       //Callback run but no data, close the connection
       //supposadly means a disconnect
       //and we still have to close the socket, even though we throw the event later
       lock (_sockets)
   catch (SocketException e)
     //Something went terribly wrong
     //which shouldn't have happened
     if (conn.socket != null)
       lock (_sockets)

EDIT: In this pattern I forgot to mention that in this area of code:

//put whatever you want to do when you receive data here

//Queue the next receive
conn.socket.BeginReceive(conn.buffer, 0, conn.buffer.Length, SocketFlags.None, new AsyncCallback(ReceiveCallback), conn);

What I would generally do is in the whatever you want code, is do reassembly of packets into messages, and then create them as jobs on the thread pool. This way the BeginReceive of the next block from the client isn't delayed while whatever message processing code is running.

The accept callback finishes reading the data socket by calling end receive. This fills the buffer provided in the begin receive function. Once you do whatever you want where I left the comment, we call the next BeginReceive method which will run the callback again if the client sends any more data. Now here's the really tricky part, when the client sends data, your receive callback might only be called with part of the message. Reassembly can become very very complicated. I used my own method and created a sort of proprietary protocol to do this. I left it out, but if you request, I can add it in. This handler was actually the most complicated piece of code I had ever written.

public bool Send(byte[] message, xConnection conn)
  if (conn != null && conn.socket.Connected)
    lock (conn.socket)
    //we use a blocking mode send, no async on the outgoing
    //since this is primarily a multithreaded application, shouldn't cause problems to send in blocking mode
       conn.socket.Send(bytes, bytes.Length, SocketFlags.None);
     return false;
   return true;

The above send method actually uses a synchronous Send call, for me that was fine due to the message sizes and the multithreaded nature of my application. If you want to send to every client, you simply need to loop through the _sockets List.

The xConnection class you see referenced above is basically a simple wrapper for a socket to include the byte buffer, and in my implementation some extras.

public class xConnection : xBase
  public byte[] buffer;
  public System.Net.Sockets.Socket socket;

Also for reference here are the usings I include since I always get annoyed when they aren't included.

using System.Net.Sockets;

I hope that's helpful, it may not be the cleanest code, but it works. There are also some nuances to the code which you should be weary about changing. For one, only have a single BeginAccept called at any one time. There used to be a very annoying .net bug around this, which was years ago so I don't recall the details.

Also, in the ReceiveCallback code, we process anything received from the socket before we queue the next receive. This means that for a single socket, we're only actually ever in ReceiveCallback once at any point in time, and we don't need to use thread synchronization. However, if you reorder this to call the next receive immediately after pulling the data, which might be a little faster, you will need to make sure you properly synchronize the threads.

Also, I hacked out alot of my code, but left the essence of what's happening in place. This should be a good start for you're design. Leave a comment if you have any more questions around this.

Kevin Nisbet
This is a good answer Kevin.. looks like you're on track to get the bounty. :)
Mystere Man
I don't know why this is the highest voted answer. Begin* End* is not the fastest way of doing networking in C#, nor the most highly scalable. It IS faster than synchronous, but there are a lot of operations that go on under the hood in Windows that really slow down this network path.
Keep in mind what esac wrote in the previous comment. The begin-end pattern will probably work for you up to a point, heck my code is currently using begin-end, but there are improvements to its limitations in .net 3.5. I don't care about the bounty but would recommend you do read the link in my answer even if you implement this approach. "Socket Performance Enhancements in Version 3.5"
I agree with jvanderh. Obviously i'd love the bounty, but even if I am not chosen, do not think that Begin*/End* is the best solution for critical applications. There are a lot of complications in the Begin/End pattern.
I think the issues identified are more performance issues than scalability ones. They seem to be targeted more at servers that move a lot of data, and in particular answer a lot of connection requests simultaneously. That's not really an issue here, as the amount of data i'm sending is relatively small.. i just have a lot of constantly connected clients. But, I'll look into this method.
Mystere Man
I just wanted to throw in their since I may not have been clear enough, this is .net 2.0 era code where I beleive this was a very viable pattern. However, esac's answer does look to be somewhat more modern if targeting .net 3.5, the only nitpick I have is the throwing of events :) but that can easily be changed. Also, I did throughput testing with this code and on a dual core opteron 2Ghz was able to max out 100Mbps ethernet, and that added an encryption layer on top of this code.
Kevin Nisbet
@Mystere Man, the issues are not only to do with performance. My apps move a lot of small (avg 5K) transactions and BeginEnd has been working fine for most customers, but those that start going above 300 concurrent connections are then ones having problems. So for me the issue is more scalability. Sure I might have other bugs in my code, but we have not been able to find them, that's why I'm currently rewriting with the new approach, but I don't yet have solid that to back microsoft's claims of improvements though it looks promissing.

I would recommend to read these books on ACE

to get ideas about patterns allowing you to create an efficient server.

Although ACE is implemented in C++ the books cover a lot of useful patterns that can be used in any programming language.

+7  A: 

I am wondering about one thing:

I definitely do not want to start a thread for each connection.

Why is that? Windows could handle hundreds of threads in an application since at least Windows 2000. I've done it, it's really easy to work with if the threads don't need to be synchronized. Especially given that you're doing a lot of I/O (so you're not CPU-bound, and a lot of threads would be blocked on either disk or network communication), I don't understand this restriction.

Have you tested the multi-threaded way and found it lacking in something? Do you intend to also have a database connection for each thread (that would kill the database server, so it's a bad idea, but it's easily solved with a 3-tier design). Are you worried that you'll have thousands of clients instead of hundreds, and then you'll really have problems? (Though I'd try a thousand threads or even ten thousand if I had 32+ GB of RAM - again, given that you're not CPU bound, thread switch time should be absolutely irrelevant.)

Here is the code - to see how this looks running, go to and click on the picture.

Server class:

  public class Server
    private static readonly TcpListener listener = new TcpListener(IPAddress.Any, 9999);

    public Server()

      while (true)
        Console.WriteLine("Waiting for connection...");

        var client = listener.AcceptTcpClient();

        // each connection has its own thread
        new Thread(ServeData).Start(client);

    private static void ServeData(object clientSocket)
      Console.WriteLine("Started thread " + Thread.CurrentThread.ManagedThreadId);

      var rnd = new Random();
        var client = (TcpClient) clientSocket;
        var stream = client.GetStream();
        while (true)
          if (rnd.NextDouble() < 0.1)
            var msg = Encoding.ASCII.GetBytes("Status update from thread " + Thread.CurrentThread.ManagedThreadId);
            stream.Write(msg, 0, msg.Length);

            Console.WriteLine("Status update from thread " + Thread.CurrentThread.ManagedThreadId);

          // wait until the next update - I made the wait time so small 'cause I was bored :)
          Thread.Sleep(new TimeSpan(0, 0, rnd.Next(1, 5)));
      catch (SocketException e)
        Console.WriteLine("Socket exception in thread {0}: {1}", Thread.CurrentThread.ManagedThreadId, e);

Server main program:

namespace ManyThreadsServer
  internal class Program
    private static void Main(string[] args)
      new Server();

Client class:

  public class Client
    public Client()
      var client = new TcpClient();
      client.Connect(IPAddress.Loopback, 9999);

      var msg = new byte[1024];

      var stream = client.GetStream();
        while (true)
          int i;
          while ((i = stream.Read(msg, 0, msg.Length)) != 0)
            var data = Encoding.ASCII.GetString(msg, 0, i);
            Console.WriteLine("Received: {0}", data);
      catch (SocketException e)
        Console.WriteLine("Socket exception in thread {0}: {1}", Thread.CurrentThread.ManagedThreadId, e);

Client main program:

using System;
using System.Threading;

namespace ManyThreadsClient
  internal class Program
    private static void Main(string[] args)
      // first argument is the number of threads
      for (var i = 0; i < Int32.Parse(args[0]); i++)
        new Thread(RunClient).Start();

    private static void RunClient()
      new Client();
Marcel Popescu
Windows can handle lots of threads, but .NET isn't really designed to handle them. Each .NET appdomain has a thread pool, and you don't want to exhaust that thread pool. I'm not sure if you start a Thread manually if it comes from the threadpool or not though. Still, hundreds of threads doing nothing for most of the time is a huge resource waste.
Mystere Man
I believe you have an incorrect view of threads. Threads only come from the thread pool if you actually want that - regular Threads do not. Hundreds of threads doing nothing waste exactly nothing :) (Well, a bit of memory, but memory is so cheap it's not really an issue anymore.)I am going to write a couple of sample apps for this, I will post an URL to it once I'm done. In the meantime, I recommend that you go over what I wrote above again and try to answer my questions.
Marcel Popescu
While I agree with Marcel's comment about the view of threads in that created threads don't come from the threadpool, the rest of the statement is not correct. Memory is not about how much is installed in a machine, all applications on windows run in virtual address space and on a 32bit system that give you 2GB on data for your app (doesn't matter how much ram is installed on the box).They still must be managed by the runtime. Doing the async IO doesn't use a thread to wait (it uses IOCP which allows overlapped IO) and is a better solution and will scale MUCH better.
Brian ONeil
Each thread takes more than "a bit" of memory. Each requires a stack, it's own thread local storage, plus overhead. I'm not sure what that value is, but it's not insignificant. Yes, this app could expand to thousands of clients. Yes, I need to access the database in each thread, which means i'd have to have a thread for shared database accesss and then deal with locking and such. In reality though, i'm sending the same data to all clients, so there is very little reason for each client to have it's own thread.
Mystere Man
I agree with Brian - and others - that there are other solutions that scale much better; however, they were not in the initial request, which was a couple of hundred threads. The solution I proposed solves *that* problem :) As for memory - my view of a server is at least quad-core, with at least 32GB of RAM - as I mentioned before. 2GB of RAM was a server a few years ago - it's not a server now. Anyway, if you found a solution you can implement more easily than this, please let us know :) (The above took me about an hour.)
Marcel Popescu
When running lots of threads it's not memory that is the problem but CPU. The context switch between threads is a relatively expensive operation and the more active threads you have the more context switches that are going to occur. A few years ago I ran a test on my PC with a C# console app and with approx. 500 threads my CPU was 100%, the threads were not doing anything significant. For network comms it is better to keep the number of threads down.
I am amazed and appalled by the amount of superstition in our profession. I was trying to solve a specific problem, where one of the important points was that the process was I/O bound, not CPU bound. If the threads are doing network communication, like in this case, thread switches are absolutely insignificant in terms of CPU utilization. They will spend most of their time blocked on I/O. You have the code here - run it and see for yourself.
Marcel Popescu
+25  A: 

There are many ways of doing network operations in C#. All of them use different mechanisms under the hood, and thus suffer major performance issues with a high concurrency. Begin* operations are one of these that many people often mistake for being the faster/fastest way of doing networking.

To solve these issues, they introduced the *Async set of methods: From MSDN

The SocketAsyncEventArgs class is part of a set of enhancements to the System.Net.Sockets..::.Socket class that provide an alternative asynchronous pattern that can be used by specialized high-performance socket applications. This class was specifically designed for network server applications that require high performance. An application can use the enhanced asynchronous pattern exclusively or only in targeted hot areas (for example, when receiving large amounts of data).

The main feature of these enhancements is the avoidance of the repeated allocation and synchronization of objects during high-volume asynchronous socket I/O. The Begin/End design pattern currently implemented by the System.Net.Sockets..::.Socket class requires a System..::.IAsyncResult object be allocated for each asynchronous socket operation.

Under the covers, the *Async API uses IO completion ports which is the fastest way of performing networking operations, see

And just to help you out, I am including the source code for a telnet server I wrote using the *Async API. I am only including the relevant portions. Also to note, instead of processing the data inline, I instead opt to push it onto a lock free (wait free) queue that is processed on a separate thread. Note that I am not including the corresponding Pool class which is just a simple pool which will create a new object if it is empty, and the Buffer class which is just a self-expanding buffer which is not really needed unless you are receiving an indeterministic amount of data. If you would like anymore information, feel free to send me a PM.

 public class Telnet
    private readonly Pool<SocketAsyncEventArgs> m_EventArgsPool;
    private Socket m_ListenSocket;

    /// <summary>
    /// This event fires when a connection has been established.
    /// </summary>
    public event EventHandler<SocketAsyncEventArgs> Connected;

    /// <summary>
    /// This event fires when a connection has been shutdown.
    /// </summary>
    public event EventHandler<SocketAsyncEventArgs> Disconnected;

    /// <summary>
    /// This event fires when data is received on the socket.
    /// </summary>
    public event EventHandler<SocketAsyncEventArgs> DataReceived;

    /// <summary>
    /// This event fires when data is finished sending on the socket.
    /// </summary>
    public event EventHandler<SocketAsyncEventArgs> DataSent;

    /// <summary>
    /// This event fires when a line has been received.
    /// </summary>
    public event EventHandler<LineReceivedEventArgs> LineReceived;

    /// <summary>
    /// Specifies the port to listen on.
    /// </summary>
    public int ListenPort { get; set; }

    /// <summary>
    /// Constructor for Telnet class.
    /// </summary>
    public Telnet()
        m_EventArgsPool = new Pool<SocketAsyncEventArgs>();
        ListenPort = 23;

    /// <summary>
    /// Starts the telnet server listening and accepting data.
    /// </summary>
    public void Start()
        IPEndPoint endpoint = new IPEndPoint(0, ListenPort);
        m_ListenSocket = new Socket(endpoint.AddressFamily, SocketType.Stream, ProtocolType.Tcp);


        // Post Accept

    /// <summary>
    /// Not Yet Implemented. Should shutdown all connections gracefully.
    /// </summary>
    public void Stop()
        //throw (new NotImplementedException());

    // ACCEPT

    /// <summary>
    /// Posts a requests for Accepting a connection. If it is being called from the completion of
    /// an AcceptAsync call, then the AcceptSocket is cleared since it will create a new one for
    /// the new user.
    /// </summary>
    /// <param name="e">null if posted from startup, otherwise a <b>SocketAsyncEventArgs</b> for reuse.</param>
    private void StartAccept(SocketAsyncEventArgs e)
        if (e == null)
            e = m_EventArgsPool.Pop();
            e.Completed += Accept_Completed;
            e.AcceptSocket = null;

        if (m_ListenSocket.AcceptAsync(e) == false)
            Accept_Completed(this, e);

    /// <summary>
    /// Completion callback routine for the AcceptAsync post. This will verify that the Accept occured
    /// and then setup a Receive chain to begin receiving data.
    /// </summary>
    /// <param name="sender">object which posted the AcceptAsync</param>
    /// <param name="e">Information about the Accept call.</param>
    private void Accept_Completed(object sender, SocketAsyncEventArgs e)
        // Socket Options
        e.AcceptSocket.NoDelay = true;

        // Create and setup a new connection object for this user
        Connection connection = new Connection(this, e.AcceptSocket);

        // Tell the client that we will be echo'ing data sent

        // Post the first receive
        SocketAsyncEventArgs args = m_EventArgsPool.Pop();
        args.UserToken = connection;

        // Connect Event
        if (Connected != null)
            Connected(this, args);

        args.Completed += Receive_Completed;

        // Post another accept

    // RECEIVE

    /// <summary>
    /// Post an asynchronous receive on the socket.
    /// </summary>
    /// <param name="e">Used to store information about the Receive call.</param>
    private void PostReceive(SocketAsyncEventArgs e)
        Connection connection = e.UserToken as Connection;

        if (connection != null)
            e.SetBuffer(connection.ReceiveBuffer.DataBuffer, connection.ReceiveBuffer.Count, connection.ReceiveBuffer.Remaining);

            if (connection.Socket.ReceiveAsync(e) == false)
                Receive_Completed(this, e);

    /// <summary>
    /// Receive completion callback. Should verify the connection, and then notify any event listeners
    /// that data has been received. For now it is always expected that the data will be handled by the
    /// listeners and thus the buffer is cleared after every call.
    /// </summary>
    /// <param name="sender">object which posted the ReceiveAsync</param>
    /// <param name="e">Information about the Receive call.</param>
    private void Receive_Completed(object sender, SocketAsyncEventArgs e)
        Connection connection = e.UserToken as Connection;

        if (e.BytesTransferred == 0 || e.SocketError != SocketError.Success || connection == null)






    /// <summary>
    /// Handles Event of Data being Received.
    /// </summary>
    /// <param name="e">Information about the received data.</param>
    protected void OnDataReceived(SocketAsyncEventArgs e)
        if (DataReceived != null)
            DataReceived(this, e);

    /// <summary>
    /// Handles Event of a Line being Received.
    /// </summary>
    /// <param name="connection">User connection.</param>
    protected void OnLineReceived(Connection connection)
        if (LineReceived != null)
            int index = 0;
            int start = 0;

            while ((index = connection.ReceiveBuffer.IndexOf('\n', index)) != -1)
                string s = connection.ReceiveBuffer.GetString(start, index - start - 1);
                s = s.Backspace();

                LineReceivedEventArgs args = new LineReceivedEventArgs(connection, s);
                Delegate[] delegates = LineReceived.GetInvocationList();

                foreach (Delegate d in delegates)
                    d.DynamicInvoke(new object[] { this, args });

                    if (args.Handled == true)

                if (args.Handled == false)

                start = index;

            if (start > 0)
                connection.ReceiveBuffer.Reset(0, start + 1);

    // SEND

    /// <summary>
    /// Overloaded. Sends a string over the telnet socket.
    /// </summary>
    /// <param name="connection">Connection to send data on.</param>
    /// <param name="s">Data to send.</param>
    /// <returns>true if the data was sent successfully.</returns>
    public bool Send(Connection connection, string s)
        if (String.IsNullOrEmpty(s) == false)
            return Send(connection, Encoding.Default.GetBytes(s));

        return false;

    /// <summary>
    /// Overloaded. Sends an array of data to the client.
    /// </summary>
    /// <param name="connection">Connection to send data on.</param>
    /// <param name="data">Data to send.</param>
    /// <returns>true if the data was sent successfully.</returns>
    public bool Send(Connection connection, byte[] data)
        return Send(connection, data, 0, data.Length);

    public bool Send(Connection connection, char c)
        return Send(connection, new byte[] { (byte)c }, 0, 1);

    /// <summary>
    /// Sends an array of data to the client.
    /// </summary>
    /// <param name="connection">Connection to send data on.</param>
    /// <param name="data">Data to send.</param>
    /// <param name="offset">Starting offset of date in the buffer.</param>
    /// <param name="length">Amount of data in bytes to send.</param>
    /// <returns></returns>
    public bool Send(Connection connection, byte[] data, int offset, int length)
        bool status = true;

        if (connection.Socket == null || connection.Socket.Connected == false)
            return false;

        SocketAsyncEventArgs args = m_EventArgsPool.Pop();
        args.UserToken = connection;
        args.Completed += Send_Completed;
        args.SetBuffer(data, offset, length);

            if (connection.Socket.SendAsync(args) == false)
                Send_Completed(this, args);
        catch (ObjectDisposedException)
            // return the SocketAsyncEventArgs back to the pool and return as the
            // socket has been shutdown and disposed of
            status = false;

        return status;

    /// <summary>
    /// Sends a command telling the client that the server WILL echo data.
    /// </summary>
    /// <param name="connection">Connection to disable echo on.</param>
    public void DisableEcho(Connection connection)
        byte[] b = new byte[] { 255, 251, 1 };
        Send(connection, b);

    /// <summary>
    /// Completion callback for SendAsync.
    /// </summary>
    /// <param name="sender">object which initiated the SendAsync</param>
    /// <param name="e">Information about the SendAsync call.</param>
    private void Send_Completed(object sender, SocketAsyncEventArgs e)
        e.Completed -= Send_Completed;              

    /// <summary>
    /// Handles a Telnet command.
    /// </summary>
    /// <param name="e">Information about the data received.</param>
    private void HandleCommand(SocketAsyncEventArgs e)
        Connection c = e.UserToken as Connection;

        if (c == null || e.BytesTransferred < 3)

        for (int i = 0; i < e.BytesTransferred; i += 3)
            if (e.BytesTransferred - i < 3)

            if (e.Buffer[i] == (int)TelnetCommand.IAC)
                TelnetCommand command = (TelnetCommand)e.Buffer[i + 1];
                TelnetOption option = (TelnetOption)e.Buffer[i + 2];

                switch (command)
                    case TelnetCommand.DO:
                        if (option == TelnetOption.Echo)
                            // ECHO
                    case TelnetCommand.WILL:
                        if (option == TelnetOption.Echo)
                            // ECHO

                c.ReceiveBuffer.Remove(i, 3);

    /// <summary>
    /// Echoes data back to the client.
    /// </summary>
    /// <param name="e">Information about the received data to be echoed.</param>
    private void Echo(SocketAsyncEventArgs e)
        Connection connection = e.UserToken as Connection;

        if (connection == null)

        // backspacing would cause the cursor to proceed beyond the beginning of the input line
        // so prevent this
        string bs = connection.ReceiveBuffer.ToString();

        if (bs.CountAfterBackspace() < 0)

        // find the starting offset (first non-backspace character)
        int i = 0;

        for (i = 0; i < connection.ReceiveBuffer.Count; i++)
            if (connection.ReceiveBuffer[i] != '\b')

        string s = Encoding.Default.GetString(e.Buffer, Math.Max(e.Offset, i), e.BytesTransferred);

        if (connection.Secure)
            s = s.ReplaceNot("\r\n\b".ToCharArray(), '*');

        s = s.Replace("\b", "\b \b");

        Send(connection, s);


    /// <summary>
    /// Disconnects a socket.
    /// </summary>
    /// <remarks>
    /// It is expected that this disconnect is always posted by a failed receive call. Calling the public
    /// version of this method will cause the next posted receive to fail and this will cleanup properly.
    /// It is not advised to call this method directly.
    /// </remarks>
    /// <param name="e">Information about the socket to be disconnected.</param>
    private void Disconnect(SocketAsyncEventArgs e)
        Connection connection = e.UserToken as Connection;

        if (connection == null)
            throw (new ArgumentNullException("e.UserToken"));



        if (Disconnected != null)
            Disconnected(this, e);

        e.Completed -= Receive_Completed;

    /// <summary>
    /// Marks a specific connection for graceful shutdown. The next receive or send to be posted
    /// will fail and close the connection.
    /// </summary>
    /// <param name="connection"></param>
    public void Disconnect(Connection connection)
        catch (Exception)

    /// <summary>
    /// Telnet command codes.
    /// </summary>
    internal enum TelnetCommand
        SE = 240,
        NOP = 241,
        DM = 242,
        BRK = 243,
        IP = 244,
        AO = 245,
        AYT = 246,
        EC = 247,
        EL = 248,
        GA = 249,
        SB = 250,
        WILL = 251,
        WONT = 252,
        DO = 253,
        DONT = 254,
        IAC = 255

    /// <summary>
    /// Telnet command options.
    /// </summary>
    internal enum TelnetOption
        Echo = 1,
        SuppressGoAhead = 3,
        Status = 5,
        TimingMark = 6,
        TerminalType = 24,
        WindowSize = 31,
        TerminalSpeed = 32,
        RemoteFlowControl = 33,
        LineMode = 34,
        EnvironmentVariables = 36
great complete answer here!
This is pretty straight forward, and a simple example. Thanks. I'm going to have to evaluate the pro's and cons of each method.
Mystere Man
I haven't had a chance to test it but I'm getting the vague feeling of a race condition in here for some reason. First, If you get lots of messages, I don't know that the events will be processed in order (may not be important for users app, but should be noted) or I could be wrong and events will process in order. Second is I may have missed it but isn't there risk of the buffer being overwritten cleared while DataReceived is still running if it takes a long time? If these possibly unjustified concerns are addressed, I think this is a very good modern solution.
Kevin Nisbet
In my case, for my telnet server, 100%, YES they are in order. The key is setting the proper callback method before calling AcceptAsync, ReceiveAsync, etc. In my case I do the SendAsync on a separate thread, so if this is modified to do an Accept/Send/Receive/Send/Receive/Disconnect pattern, then it will need to be modified.
Point #2 is also something you will need to take into consideration. I am storing my 'Connection' object in the SocketAsyncEventArgs context. What this means is that I only have one receive buffer per connection. I am not posting another receive with this SocketAsyncEventArgs until DataReceived is complete, so no further data can be read on this until it is complete. I DO ADVISE that no long operations be done on this data. I actually move the whole buffer of all data received onto a lockfree queue and then process it on a separate thread. This ensures low latency on the network portion.
On a side note, I wrote unit tests and load tests for this code, and as I increased the user load from 1 user to 250 users (on a single dual core system, 4GB of RAM), the response time for 100 bytes (1 packet) and 10000 bytes (3 packets) stayed the same throughout the whole user load curve.
Yea, if using a lock free queue attached to you're Received event I agree this code should perform very well, and looks good. Since this is a telnet test, you may not have noticed (since it's a text protocol) but have you ever used this model for binary messages, I found the biggest challenge developing my code was reassembly since the callback will be called with only pieces of messages sent by the client.
Kevin Nisbet
I've used it for telnet and HTTP, but yes, HTTP is still a text based protocol. Typically you have some kind of indication as to the size of data to be recieved. If not you can still concatenate the buffers on a separate thread and then parse them as necessary.
+1  A: 

I would use the AcceptAsync/ConnectAsync/ReceiveAsync/SendAsync methods that were added in .Net 3.5. I have done a benchmark and they are approximately 35% faster (response time and bitrate) with 100 users constantly sending and receiving data.

Jonnie Drew
+7  A: 

You already got the most part of the answer via the code samples above. Using asynchronous IO operation is absolutely the way to go here. Async IO is the way the Win32 is designed internally to scale. The best possible performance you can get is achieved using Completion Ports, binding your sockets to completion ports and have a thread pool waiting for completion port completion. The common wisdom is to have 2-4 threads per CPU(core) waiting for completion. I highly recommend to go over these three articles by Rick Vicik from the Windows Performance team:

  1. Designing Applications for Performance - Part 1
  2. Designing Applications for Performance - Part 2
  3. Designing Applications for Performance - Part 3

The said articles cover mostly the native Windows API, but they are a must read for anyone trying to get a grasp at scalability and performance. They do have some briefs on the managed side of things too.

Second thing you'll need to do is make sure you go over the Improving .NET Application Performance and Scalability book, that is available online. You will find pertinent and valid advice around the use of threads, asynchronous calls and locks in Chapter 5. But the real gems are in Chapter 17 where you'll find such goodies as practical guidance on tuning your thread pool. My apps had some serious problems until I adjusted the maxIothreads/maxWorkerThreads as per the recommendations in this chapter.

You say that you want to do a pure TCP server, so my next point is spurious. However, if you find yourself cornered and use the WebRequest class and its derivatives, be warned that there is a dragon guarding that door: the ServicePointManager. This is a configuration class that has one purpose in life: to ruin your performance. Make sure you free your server from the artificial imposed ServicePoint.ConnectionLimit or your application will never scale (I let you discover urself what is the default value...). You may also reconsider the default policy of sending an Expect100Continue header in the http requests.

Now about the core socket managed API things are fairly easy on the Send side, but they are significantly more complex on the Receive side. In order to achieve high throughput and scale you must ensure that the socket is not flow controlled because you do not have a buffer posted for receive. Ideally for high performance you should post ahead 3-4 buffers and post new buffers as soon as you get one back (before you process the one got back) so you ensure that the socket always has somewhere to deposit the data coming from the network. You'll see why you probably won't be able to achieve this shortly.

After you're done playing with the BeginRead/BeginWrite API and start the serious work you'll realize that you need security on your traffic, ie. NTLM/Kerberos authentication and traffic encryption, or at least traffic tampering protection. The way you do this is you use the built in System.Net.Security.NegotiateStream (or SslStream if you need to go cross disparate domains). This means that instead of relying on straight socket asynchronous operations you will rely on the AuthenticatedStream asynchronous operations. As soon as you obtain a socket (either from connect on client or from accept on server) you create a stream on the socket and submit it for authentication, by calling either BeginAuthenticateAsClient or BeginAuthenticateAsServer. After the authentication completes (at least your safe from the native InitiateSecurityContext/AcceptSecurityContext madness...) you will do your authorization by checking the RemoteIdentity property of your Authenticated stream and doing whatever ACL verification your product must support. After that you will send messages using the BeginWrite and you'll be receiving them with BeginRead. This is the problem I was talking before that you won't be able to post multiple receive buffers, because the AuthenticateStream classes don't support this. The BeginRead operation manages internally all the IO until you have received an entire frame, otherwise it could not handle the the message authentication (decrypt frame and validate signature on frame). Though in my experience the job done by the AuthenticatedStream classes is fairly good and shouldn't have any problem with it. Ie. you should be able to saturate GB network with only 4-5% CPU. The AuthenticatedStream classes will also impose on you the protocol specific frame size limitations (16k for SSL, 12k for Kerberos).

This should get you started on the right track. I'm not going to post code here, there is a perfectly good example on MSDN. I've done many projects like this and I was able to scale to about 1000 users connected without problems. Above that you'll need to modify registry keys to allow the kernel for more socket handles. and make sure you deploy on a server OS, that is W2K3 not XP or Vista (ie. client OS), it makes a big difference.

BTW make sure if you have databases operations on the server or file IO you also use the async flavor for them, or you'll drain the thread pool in no time. For SQL Server connections make sure you add the 'Asyncronous Processing=true' to the connection string.

Remus Rusanu
There is some great information here. I wish i could award multiple people the bounty. However, I have upvoted you. Good stuff here, thanks.
Mystere Man
+4  A: 

There used to be a really good discussion of scalable TCP/IP using .NET written by Chris Mullins of Coversant, unfortunately it appears his blog has disappeared from its prior location, so I will try to piece together his advice from memory (some useful comments of his appear in this thread: C++ vs. C#: Developing a highly scalable IOCP server)

First and foremost, note that both using Begin/End and the Async methods on the Socket class make use of IO Completion Ports (IOCP) to provide scalability. This makes a much bigger difference (when used correctly; see below) to scalability than which of the two methods you actually pick to implement your solution.

Chris Mullins' posts were based on using Begin/End, which is the one I personally have experience with. Note that Chris put together a solution based on this that scaled up to 10,000s of concurrent client connections on a 32-bit machine with 2GB of memory, and well into 100,000s on a 64-bit platform with sufficient memory. From my own experience with this technique (altho nowhere near this kind of load) I have no reason to doubt these indicative figures.

IOCP versus thread-per-connection or 'select' primitives

The reason you want to use a mechanism that uses IOCP under the hood is that it uses a very low-level Windows thread pool that does not wake up any threads until there is actual data on the IO channel that you are trying to read from (note that IOCP can be used for file IO as well). The benefit of this is that Windows does not have to switch to a thread only to find that there is no data yet anyway, so this reduces the number of context switches your server will have to make to the bare minimum required.

Context switches is what will definitely kill the 'thread-per-connection' mechanism, although this is a viable solution if you are only dealing with a few dozen connections. This mechanism is however by no stretch of the imagination 'scalable'.

Important considerations when using IOCP


First and foremost it is critical to understand that IOCP can easily result in memory issues under .NET if your implementation is too naive. Every IOCP BeginReceive call will result in "pinning" of the buffer you are reading into. For a good explanation of why this is a problem, see: Yun Jin's Weblog: OutOfMemoryException and Pinning.

Luckily this problem can be avoided, but it requires a bit of a trade-off. The suggested solution is to allocate a big byte[] buffer at application start-up (or close thereto), of at least 90KB or-so (as of .NET 2, required size may be larger in later versions). The reason to do this is that large memory allocations automatically end up in a non-compacting memory segment (The Large Object Heap) that is effectively automatically pinned. By allocating one large buffer at start-up you make sure that this block of unmovable memory is at a relatively 'low address' where it will not get in the way and cause fragmentation.

You then can use offsets to segment this one big buffer into separate areas for each connection that needs to read some data. This is where a trade-off comes into play; since this buffer needs to be pre-allocated, you will have to decide how much buffer space you need per connection, and what upper limit you want to set on the number of connections you want to scale to (or, you can implement an abstraction that can allocate additional pinned buffers once you need them).

The simplest solution would be to assign every connection a single byte at a unique offset within this buffer. Then you can make a BeginReceive call for a single byte to be read, and perform the rest of the reading as a result of the callback you get.


When you get the callback from the Begin call you made, it is very important to realise that the code in the callback will execute on the low-level IOCP thread. It is absolutely essential that you avoid lengthy operations in this callback. Using these threads for complex processing will kill your scalability just as effectively as using 'thread-per-connection'.

The suggested solution is to use the callback only to queue up a work item to process the incoming data, that will be executed on some other thread. Avoid any potentially blocking operations inside the callback so that the IOCP thread can return to its pool as quickly as possible. In .NET 4.0 I'd suggest the easiest solution is to spawn a Task, giving it a reference to the client socket and a copy of the first byte that was already read by the BeginReceive call. This task is then responsible for reading all data from the socket that represent the request you are processing, executing it, and then making a new BeginReceive call to queue the socket for IOCP once more. Pre .NET 4.0, you can use the ThreadPool, or create your own threaded work-queue implementation.


Basically, I'd suggest using Kevin's sample code for this solution, with the following added warnings:

  • Make sure the buffer you pass to BeginReceive is already 'pinned'
  • Make sure the callback you pass to BeginReceive does nothing more than queue up a task to handle the actual processing of the incoming data

When you do that, I have no doubt you could replicate Chris' results in scaling up to potentially hundreds of thousands of simultaneous clients (given the right hardware and an efficient implementation of your own processing code ofcourse ;)

Thanks Jerry, good advice. I'll keep it in mind.
Mystere Man

You can use Push Framework open source framework for high-performance server development. It is built on IOCP and is suitable for push scenarios and message broadcast.

This post was tagged C# and .net. Why did you suggest a C++ framework?
Mystere Man