C++ Socket Server - Unable to saturate CPU

+15 A:

boost::asio is not as thread-friendly as you would hope - there is a big lock around the epoll code in boost/asio/detail/epoll_reactor.hpp which means that only one thread can call into the kernel's epoll syscall at a time. And for very small requests this makes all the difference (meaning you will only see roughly single-threaded performance).

Note that this is a limitation of how boost::asio uses the Linux kernel facilities, not necessarily the Linux kernel itself. The epoll syscall does support multiple threads when using edge-triggered events, but getting it right (without excessive locking) can be quite tricky.

BTW, I have been doing some work in this area (combining a fully-multithreaded edge-triggered epoll event loop with user-scheduled threads/fibers) and made some code available under the nginetd project.

cmeerw 2009-08-06 11:30:39

Thanks for the info cmeerw, thats interesting stuff.

Alex Black 2009-08-06 12:12:53

(+1) cmeer I have an unanswered post relating performance of boost::asio in general on windows and linux. If you have read large sections of asio please come and answer my post :P

Hassan Syed 2009-12-14 15:31:41

I was really worried about this global lock. It is not as big an issue as it would seem. The bottle neck can only occur in high through put scenarios. However, when asio is running in epoll mode (linux) it preemptively tries to write or read when the `async_*` call is issued. In a high input scenario the socket will usually be ready for reading, letting `async_read` skip epoll entirely. You can't ask for better network performance than that.

caspin 2009-12-21 20:06:44

I don't think it's the case. Yes, it looks like epoll reactor has a scoped lock for the entire duration of the run() function, but it's temporarily released ("lock.unlock();") before calling into epoll_wait and locked again after epoll_wait returns("lock.lock();"). Not sure why it's done this way instead of two scoped locks, though.

Alex B 2010-04-01 00:29:13

@Alex Black bump, so that the previous comment reaches the OP. What were your results with this question? Did replacing boost::asio help?

Alex B 2010-04-01 02:57:01

@Checkers: Sorry, I didn't go far enough with this to come to any conclusion.

Alex Black 2010-04-05 20:31:46

A:

From your comments on network utilization,
You do not seem to have much network movement.

3 + 2.5 MiB/sec is around the 50Mbps ball-park (compared to your 1Gbps port).

I'd say you are having one of the following two problems,

Insufficient work-load (low request-rate from your clients)
Blocking in the server (interfered response generation)

Looking at cmeerw's notes and your CPU utilization figures
(idling at 50% + 20% + 0% + 0%)
it seems most likely a limitation in your server implementation.
I second cmeerw's answer (+1).

nik 2009-08-06 11:46:17

A:

230 requests/sec seems very low for such simple async requests. As such, using multiple threads is probably premature optimisation - get it working properly and tuned in a single thread, and see if you still need them. Just getting rid of un-needed locking may get things up to speed.

This article has some detail and discussion on I/O strategies for web server-style performance circa 2003. Anyone got anything more recent?

soru 2009-08-06 12:51:26

Keep in mind the 230 requests/sec are 'application requests' which are composed of many actually HTTP requests.

Alex Black 2009-08-06 13:46:46

There isn't much locking to get rid of, none in my code, but as cmeerw points out boost::asio does some internal locking. The HTTP server does purely CPU-bounded work, so not using the additional cores would be an expensive waste

Alex Black 2009-08-06 13:49:01

If the goal is just to saturate the CPU, do the work in one thread and have the other three calculate PI or something.Having multiple user-level threads won't make it easier or faster for the OS and IO hardware to read and write network packets. Threads and cores are for computational work, if you aren't doing any, they can't possibly gain you anything, and risk contention with whatever else the system is doing.

soru 2009-08-06 14:25:15

As I said: "the HTTP server does purely CPU-bounded work".

Alex Black 2009-08-06 14:27:48

Except, demonstrably, it's not.Optimal solution is probably one thread doing I/O and 2 or 3 the parsing and so on. But that's very likely premature optimisation until you can get your IO properly asynchronously scheduled so you either saturate one CPU core or your network.

soru 2009-08-06 15:23:19

I see what you're saying. Well, I'll fire up the server with 1 thread as a quick test and see what comes of that.

Alex Black 2009-08-06 16:11:57

+1 A:

As you are using EC2, all bets are off.

Try it using real hardware, and then you might be able to see what's happening. Trying to do performance testing in VMs is basically impossible.

I have not yet worked out what EC2 is useful for, if someone find out, please let me know.

MarkR 2009-08-06 12:59:37

This system is going to be deployed in EC2, so the testing the performance of the system on real hardware wouldn't be helpful I don't think.

Alex Black 2009-08-06 13:48:14

Mark's point is valid: For profiling use a real machine, or at least a more controlled environment. Deploy to EC2 all you like, but understand that you are running in a VM image and that means that your "idle" CPU might just be because some other tenant on the box got all the CPU for a while. And that makes profiling difficult.

janm 2009-11-18 06:46:32

(+1) hate ill-informed down votes

Hassan Syed 2010-01-18 15:08:37

ansaurus

tags:

views:

answers:

C++ Socket Server - Unable to saturate CPU

related questions