views:

496

answers:

3

I will be creating x amount of threads in my server-app. x will be the amount of cores on the machine, and these threads will be (non-hyperthread) core-bound. Naturally with this scheme I would like to distribute incoming connections across the threads with the aim of ensuring that once a connection is assigned to a thread, it will only be served out of that particular thread. How is this achieved in boost::asio ?

I am thinking: a single socket bound to an address shared by multiple io_service's where each threads gets it's own io_service. Is this line of reasoning correct ?

edit: looks like I am going to have to answer this question myself.

+2  A: 

If your server-app is supposed to run on a Windows machine, then you should consider to use IO completion ports.

It is able to limit the number of active threads to the number of cores. It distributes the IO events from a theoretically infinite number of sockets to the active threads. Scheduling is done by the OS. Here is a good example how to do it.

Olliwaa
(+1) IO completion ports kick ass -- However, my current implementation uses IO completion ports and named pipes. I am rewriting the thing from scratch and portability is important.
Hassan Syed
+1, here's a link to the latest version of that source code example that you link to: http://www.lenholgate.com/archives/000637.html
Len Holgate
A: 

You can use a single io_service which is used by multiple threads and a strand to ensure that a connection is always handled by the same thread. Take a look at HTTP server 3 example.

Tom
My question is a little more specific/advanced than the examples presented.
Hassan Syed
+3  A: 

Yes, your reasoning is basically correct. You would create a thread per core, an io_service instance per thread, and call io_service.run() in each thread.

However, the question is whether you'd really do it that way. These are the problems I see:

  • You can end up with very busy cores and idling cores depending on how the work is balanced across your connections. Micro-optimising for cache hits in a core might mean that you end up losing the ability to have an idle core do work when the "optimal" core is not ready.

  • At socket speeds (ie: slow), how much of a win will you get from CPU cache hits? If one connection requires enough CPU to keep a core busy and you only up as many connections as cores, then great. Otherwise the inability to move work around to deal with variance in workload might destroy any win you get from cache hits. And if you are doing lots of different work in each thread, the cache isn't going to that hot anyway.

  • If you're just doing I/O the cache win might not be that big, regardless. Depends on your actual workload.

My recommendation would be to have one io_service instance and call io_service.run() in a thread per core. If you get inadequate performance or have classes of connections where there is a lot of CPU per connection and you can get cache wins, move those to specific io_service instances.

This is a case where you should do profiling to see how much cache misses are costing you, and where.

janm
Great answer, and great reasoning. I will probably stick to your suggestion till I see a problem -- it's good to know that it's possible should I choose to micro-optimize.
Hassan Syed