views:

786

answers:

6

There are many questions related to Stackless Python. But none answering this my question, I think (correct me if wrong - please!). There's some buzz about it all the time so I curious to know. What would I use Stackless for? How is it better than CPython?

Yes it has green threads (stackless) that allow quickly create many lightweight threads as long as no operations are blocking (something like Ruby's threads?). What is this great for? What other features it has I want to use over CPython?

Please share your experience

+2  A: 

EVEOnline is largely programmed in Stackless Python. They have several dev blogs on the use of it. It seems it is very useful for high performance computing.

Thirler
why would it be better for high performance computing?
zaharpopov
+7  A: 

Stackless Python's main benefit is the support for very lightweight coroutines. CPython doesn't support coroutines natively (although I expect someone to post a generator-based hack in the comments) so Stackless is a clear improvement on CPython when you have a problem that benefits from coroutines.

I think the main area where they excel are when you have many concurrent tasks running within your program. Examples might be game entities that run a looping script for their AI, or a web server that is servicing many clients with pages that are slow to create.

You still have many of the typical problems with concurrency correctness however regarding shared data, but the deterministic task switching makes it easier to write safe code since you know exactly where control will be transferred and therefore know the exact points at which the shared state must be up to date.

Kylotan
in what sense "concurrent"? after all these are only green threads, not really providing any concurrency, just the illusion of it
zaharpopov
oh, and also, coroutine with generators is not a hack - this is why yield receives an argument in Python and David Beazley has great article about it
zaharpopov
The tasks are concurrent in that several are still in progress at any one time. The fact that execution is only happening in one of them is irrelevant. The same applies to real threads and processes much of the time too due to waiting on I/O. And saying that coroutines with generators is not a hack is like saying coroutines in C++ with setjmp/longjmp is not a hack. :P The system may have been designed to allow it but it's a long way off being an elegant solution.
Kylotan
they're really green threads? CPython's threads are not green...
Matt Joiner
He's talking about Stackless's coroutines.
Kylotan
+5  A: 

It allows you to work with massive amounts of concurrency. Nobody sane would create 100.000 system threads, but you can do this using stackless.

This article tests doing just that, creating 100.000 tasklets in both Python and Google Go (a new programming language): http://dalkescientific.com/writings/diary/archive/2009/11/15/100000_tasklets.html

Surprisingly, even if Google Go is compiled to native code, and they tout their co-routines implementation, Python still wins.

Stackless would be good for implementing a map/reduce algorithm, where you can have a very large number of reducers depending on your input data.

Adal
@Adal: but isn't map reduce good for parallelization, while Stackless runs all tasklets on single CPU?
zaharpopov
You are right, but if speed is not critical or the dataset not huge, you could implement map/reduce very easily in Stackless
Adal
@Adal, but map/reduce is for speed why else do I need it??????????
zaharpopov
+3  A: 

Thirler already mentioned that stackless was used in Eve Online. Keep in mind, that:

(..) stackless adds a further twist to this by allowing tasks to be separated into smaller tasks, Tasklets, which can then be split off the main program to execute on their own. This can be used for fire-and-forget tasks, like sending off an email, or dispatching an event, or for IO operations, e.g. sending and receiving network packets. One tasklet waits for a packet from the network while others continue running the game loop.

It is in some ways like threads, but is non-preemptive and explicitly scheduled, so there are fewer issues with synchronization. Also, switching between tasklets is much faster than thread switching, and you can have a huge number of active tasklets whereas the number of threads is severely limited by the computer hardware.

(got this citation from here)

At PyCon 2009 there was given a very intersting talk, describing why and how stackless is used at CCP Games.

Also, there is a very good introductory material, which describes why stackless is a good solution for Your applications. (it may be somewhat old, but I think that it is worth reading).

zeroDivisible
+3  A: 

The basic usefulness for green threads, the way I see it, is to implement a system in which you have a large amount of objects that do high latency operations. A concrete example would be communicating with other machines:

def Run():
    # Do stuff
    request_information() # This call might block
    # Proceed doing more stuff

Threads let you write the above code naturally, but if the number of objects is large enough, threads just cannot perform adequately. But you can use green threads even for in really large amounts. The request_information() above could switch out to some scheduler where other work is waiting and return later. You get all the benefits of being able to call "blocking" functions as if they return immediately without using threads.

This is obviously very useful for any kind of distributed computing if you want to write code in a straightforward way.

It is also interesting for multiple cores to mitigate waiting for locks:

def Run():
    # Do some calculations
    green_lock(the_foo)
    # Do some more calculations

The green_lock function would basically attempt to acquire the lock and just switch out to a main scheduler if it fails due to other cores using the object.

Again, green threads are being used to mitigate blocking, allowing code to be written naturally and still perform well.

I don't think green threads can mitigate blocking. Rather, green threads NEED non-blocking code in order to be effective.In your example, if request_informaton is really blocking like you asserted, then it'd just block the entire Python interpretor and the green threads wouldn't be able to achieve any concurrency.
Continuation
You are just plain wrong.
Actually you're plain wrong.Green threads are userland threads. You call a blocking call in a green thread and the entire interpretor will get blocked.http://en.wikipedia.org/wiki/Green_threads"a green thread may block all other threads if performing a blocking I/O operation."That's why you need to use non-blocking IO when using green thread. Green thread doesn't magically turn a blocking call into a non-blocking call
Continuation
+1  A: 

While I've not used Stackless itself, I have used Greenlet for implementing highly-concurrent network applications. Some of the use cases Linden Lab has put it towards are: high-performance smart proxies, a fast system for distributing commands over huge numbers of machines, and an application that does a ton of database writes and reads (at a ratio of about 1:2, which is very write-heavy, so it's spending most of its time waiting for the database to return), and a web-crawler-type-thing for internal web data. Basically any app that's expecting to have to do a lot of network I/O will benefit from being able to create a bajillion lightweight threads. 10,000 connected clients doesn't seem like a huge deal to me.

Stackless or Greenlet aren't really a complete solution, though. They are very low-level and you're going to have to do a lot of monkeywork to build an application with them that uses them to their fullest. I know this because I maintain a library that provides a networking and scheduling layer on top of Greenlet, specifically because writing apps is so much easier with it. There are a bunch of these now; I maintain Eventlet, but also there is Concurrence, Chiral, and probably a few more that I don't know about.

If the sort of app you want to write sounds like what I wrote about, consider one of these libraries. The choice of Stackless vs Greenlet is somewhat less important than deciding what library best suits the needs of what you want to do.

rdw