views:

661

answers:

6

(Commonly called the C10K problem)

Is there a more contemporary review of solutions to the c10k problem (Last updated: 2 Sept 2006), specifically focused on Linux (epoll, signalfd, eventfd, timerfd..) and libraries like libev or libevent?

Something that discusses all the solved and still unsolved issues on a modern Linux server?

+5  A: 

Coincidentally, just a few days ago, Programming Reddit or maybe Hacker News mentioned this piece:

Thousands of Threads and Blocking IO

In the early days of Java, my C programming friends laughed at me for doing socket IO with blocking threads; at the time, there was no alternative. These days, with plentiful memory and processors it appears to be a viable strategy.

The article is dated 2008, so it pulls your horizon up by a couple of years.

Carl Smotricz
I'm sure your hardware vendor is happy.
ninjalj
I'm more concerned with making damjan.mk happy. But please don't misconstrue my comment: This approach runs fine on a run-of-the-mill store bought PC, which is hard these days to find with less than a dual core CPU and 2G of RAM.
Carl Smotricz
+1  A: 

libev runs some benchmarks against themselves and libevent...

rogerdpack
+3  A: 

The C10K problem generally assumes you're trying to optimize a single server, but as your referenced article points out "hardware is no longer the bottleneck". Therefore, the first step to take is to make sure it isn't easiest and cheapest to just throw more hardware in the mix.

If we've got a $500 box serving X clients per second, it's a lot more efficient to just buy another $500 box to double our throughput instead of letting an employee gobble up who knows how many hours and dollars trying to figure out how squeeze more out of the original box. Of course, that's assuming our app is multi-server friendly, that we know how to load balance, etc, etc...

joe snyder
What if someone wants to write a high-performance library to save your money and possibly of thousands others?
jweyrich
joe snyder
+4  A: 

To answer OP's question, you could say that today the equivalent document is not about optimizing a single server for load, but optimizing your entire online service for load. From that perspective, the number of combinations is so large that what you are looking for is not a document, it is a live website that collects such architectures and frameworks. Such a website exists and its called www.highscalability.com

Side Note 1:

I'd argue against the belief that throwing more hardware at it is a long term solution:

  • Perhaps the cost of a performance engineer is low compared to the cost of a single server. What happens when you scale out? Lets say you have 100 servers. A 10 percent improvement in server capacity can save you 10 servers a month. Thats more than what the performance engineer costs you.

  • Even if you have just two machines, you still need to handle performance spikes. The difference between a service that degrades gracefully under load and one that breaks down is that someone spent time optimizing for the load scenario.

Side note 2:

The subject of this post is slightly misleading. The CK10 document does not try to solve the problem of 10k clients per second. (The number of clients per second is irrelevant unless you also define a workload along with sustained throughput under bounded latency. I think Dan Kegel was aware of this when he wrote that doc.). Look at it instead as a compendium of approaches to build concurrent servers, and micro-benchmarks for the same. Perhaps what has changed between then and now is that you could assume at one point of time that the service was for a website that served static pages. Today the service might be a noSQL datastore, a cache, a proxy or one of hundreds of network infrastructure software pieces.

tholomew
Couldn't agree more with your arguments. The title was edited according to the comments above - IMO, badly. I liked your reference, but I'm afraid I won't be able to read much of it in time. +1.
jweyrich
I agree with your points, but you need to update your break even point for a performance engineer. A reasonable server (8 core, 7 GB RAM) costs US$0.68 per hour on Amazon's EC2. Cutting 10 servers only saves you $60K per year. That won't get you much of a performance engineer.
Ken Fox
+1  A: 

I'd recommend Reading Zed Shaw's poll, epoll, science, and superpoll[1]. Why epoll isn't always the answer, and why sometimes it's even better to go with poll, and how to bring the best of both worlds.

[1] http://sheddingbikes.com/posts/1280829388.html

racetrack
@shadowfax: I can't confirm his results right now, but I believe it's true since each call to poll requires much more data (all events you're interested in) to be transferred between user and kernel-space. The compensation should start when more of these events are active, and more new events are occurring (read higher load). I should say I don't like the "superpoll" approach, as it adds lots of unnecessary syscalls for not being a kernel implementation. Anyway, the article gave me good insights. +1.
jweyrich
@jweyrich: Did you see this? http://sheddingbikes.com/posts/1280882826.html He has provided the full C code, as well as the R environment of his tests, might help experimenting this on your own
racetrack
@shadowfax: yes, thanks. I'll be testing it when I get some free time. More here: http://sheddingbikes.com/posts/1281174543.html
jweyrich
A: 

Have a look at the RamCloud project at Stanford: http://fiz.stanford.edu:8081/display/ramcloud/Home

Their goal is 1,000,000 RPC operations/sec/server. They have numerous benchmarks and commentary on the bottlenecks that are present in a system which would prevent them from reaching their throughput goals.

Noah Watkins
They should change their goal to 1. Or there's more people trying to access it? :-(
jweyrich
I don't follow. What are you trying to say? I'm saying 1M serviced requests independent of the number of client.
Noah Watkins
@Noah: I meant the link you posted is inaccessible for me. Is it responding your requests? Timeouts here.
jweyrich
Oh, haha...Yeh, it's summer time at Stanford. No sys admins :P
Noah Watkins