views:

209

answers:

5

Let's say I have a web application running on S servers with an average of C cores each. My application is processing an average of R requests at any instant. Assuming R is around 10 times larger than S * C, won't benefits from spreading the work of a request across multiple cores be minimal since each core is processing around 10 requests already?

If I'm correct why does this guy say concurrency is so important to the future of Python as a language for web development?

I can see many reasons why my argument would be incorrect. Perhaps the application receives a few very-difficult-to-process requests that are outnumbered by available cores. Or perhaps there is a large variation in the difficulty of requests, so it's possible for one core to be unlucky and be given 10 consecutive difficult requests, with the result being that some of them take much longer than is reasonable. Given that the guy who wrote the above essay is so much more experienced than I am, I think there's a significant chance I'm wrong about this, but I'd like to know why.

+4  A: 

Not anytime soon in my estimation. The lifespan of most single web requests are well under a second. In light of this it makes little sense to split up the web request task itself and rather distribute the web request tasks across the cores. Something web servers are capable of and most already do.

Spencer Ruport
+5  A: 

In the hypothetical circumstances you design, with about 10 requests "in play" per core, as long as the request-to-core assignment is handled sensibly (probably even the simplest round-robin load balancing will do), it's just fine if each request lives throughout its lifetime on a single core.

Point is, that scenario's just ONE possibility -- heavy requests that could really benefit (in terms of lower latency) from marshaling multiple cores per request are surely an alternative possibility. I suspect that on today's web your scenario is more prevalent, but it sure would be nice to handle both kinds, AND "batch-like" background processing ones too.... especially since the number of cores (as opposed to each core's speed) is what's increasing, and what's going to keep increasing, these days.

Far be it from me to argue against Jacob Kaplan-Moss's wisdom, but I'm used to getting pretty good concurrency, at my employer, in nicer and more explicit AND trasparent ways than he seems to advocate -- mapreduce for batch-like jobs, distributed-hashing based sharding for enrolling N backends to shard the work for 1 query, and the like.

Maybe I just don't have enough real-life experience with (say) Erlang, Scala, or Haskell's relatively-new software transactional memory, to see how wonderfully they scale to high utilization of tens or hundrends of thousands of cores on low-QPS, high-work-per-Q workloads... but it seems to me that the silver bullet for this scenario (net of the relatively limited subset of cases where you can turn to mapreduce, pregel, sharding, etc) has not yet been invented in ANY language. With explicit, carefully crafted architecture Python is surely no worse than Java, C# or C++ at handling such scenarios, in my working experience at least.

Alex Martelli
+1  A: 

Caveat: I've only skimmed the "Concurrency" section, which seems to be what you're referring to.The issue seems to be (and this isn't new, of course):

  • Python threads don't run in parallel due to the GIL.
  • A system with many cores will need as many backends (in practice, you probably want at least 2xN threads).
  • Systems are moving towards having more cores; typical PCs have four cores, and affordable server systems with 128 or more cores probably aren't far off.
  • Running 256 separate Python processes means no data is shared; the entire application and any loaded data is replicated in each process, leading to massive memory waste.

The last bit is where this logic fails. Indeed, if you start 256 Python backends in the naive way, there's no data shared. However, that has no design forethought: that's the wrong way to start lots of backend processes.

The correct way is to load your entire application (all of the Python modules you depend on, etc.) in a single master process. Then that master process forks off backend processes to handle requests. These become separate processes, but standard copy-on-write memory management means that all fixed data already loaded is shared with the master. All of the code that was loaded in advance by the master is now shared among all of the workers, despite the fact that they're all separate processes.

(Of course, COW means that if you write to it, it makes a new copy of the data--but things like compiled Python bytecode should not be changed after loading.)

I don't know if there are Python-related problems which prevent this, but if so, those are implementation details to be fixed. This approach is far easier to implement than trying to eliminate the GIL. It also eliminates any chance of traditional locking and threading problems. Those aren't as bad as they are in some languages and other use cases--there's almost no interaction or locking between the threads--but they don't disappear completely and race conditions in Python are just as much of a pain to track down as they are in any other language.

Glenn Maynard
A: 

In the article, he seems to single out the GIL as the cause of holding back concurrent processing in web applications in Python, which I simply don't understand. As you get larger, eventually you're going to have another server, and, GIL or not GIL, it won't matter - you have multiple machines.

If he's talking about being able to squeeze more out of a single computer, then I don't think thats as relevant, especially to large-scale distributed computing - different machines don't share a GIL. And, really, if you going to have lots of computers in a cluster, it's better to have a more mid-range servers instead of a single super server for a lot of reasons.

If he means as a way for better supporting functional and asynchronous approaches, then I somewhat agree, but it seems tangential to his "we need better concurrency" point. Python can has it now (which he acknowledges), but, apparently, its not good enough (all because of the GIL, naturally). To be honest, it seems more like bashing on the GIL than a justification of the importance of concurrency in web development.

One important point, with regards to concurrency and web development, is that concurrency is hard. The beauty of something like PHP is that there is no concurrency. You have a process, and you are stuck in that process. Its so simple and easy. You don't have to worry about any sort of concurrency problems - suddenly programming is much easier.

Richard Levasseur
Err, you always want concurrency for webservers by nature--and concurrency is *easy* for FCGI/SCGI-based webservers, since you just start multiple backend processes. His argument seems to be the memory concerns of using processes instead of threads (as I described earlier).
Glenn Maynard
Your argument about mid-range servers is correct now, but his argument--and it's reasonable, in my opinion--is that it won't be soon. Quad-core systems turned from high-end server equipment to cheap, standard desktop hardware nearly overnight, and it's reasonable to expect that the trend to more cores will continue. Thus "mid-range servers" will *themselves* have many more cores, and need concurrency to match.
Glenn Maynard
+1  A: 

One thing you're omitting is that a web request isn't a single sequential series of instructions that involve only the CPU.

A typical web request handler might need to do some computation with the CPU, then read some config data off the disk, then ask the database server for some records that have to get transferred to it over ethernet, and so on. The CPU usage might be low, but it could still take a nontrivial amount of time due to waiting on all that I/O between each step.

I think that, even with the GIL, Python can run other threads while one thread waits on I/O. (Other processes certainly can.) Still, Python threads aren't like Erlang threads: start enough of them and it'll start to hurt.

Another issue is memory. C libraries are shared between processes, but (AFAIK) Python libraries aren't. So starting up 10x as many Python processes may reduce I/O waiting, but now you've got 10 copies of each Python module loaded, per core.

I don't know how significant these are, but they do complicate things well beyond "R > 10 * S * C". There's lots still to be done in the Python world to solve them, because these aren't easy problems.

Alec
You're correct--the most common example is probably SQL queries. However, this is mostly significant for I/O-bound loads. The question of scaling to 128-core machines, by nature, is important for CPU-bound loads. (See my answer for one approach to the shared memory problem.)
Glenn Maynard