views:

947

answers:

8

We are planning to write a highly concurrent application in any of the Very-High Level programming languages.

1) Do Python, Ruby, or Haskell support true multithreading?

2) If a program contains threads, will a Virtual Machine automatically assign work to multiple cores (or to physical CPUs if there is more than 1 CPU on the mainboard)?

True multithreading = multiple independent threads of execution utilize the resources provided by multiple cores (not by only 1 core).

False multithreading = threads emulate multithreaded environments without relying on any native OS capabilities.

+6  A: 

The current version of Ruby 1.9(YARV- C based version) has native threads but has the problem of GIL. As I know Python also has the problem of GIL.

However both Jython and JRuby(mature Java implementations of both Ruby and Python) provide native multithreading, no green threads and no GIL.

Don't know about Haskell.

khelll
Are Jython and JRuby very different from their origins?
psihodelia
What does GIL mean? Google suggests "Gas Insulated Lines" which is not helpful....
tobsen
http://docs.python.org/c-api/init.html#thread-state-and-the-global-interpreter-lock explains what the Global interpreter lock (GIL) is.
tobsen
@psihodelia no they don't differ, they are just written using Java instead of C. If you have some C extensions then you won't be able to use them inside Jython and JRuby, but actually both Jython and JRuby let you use Java jars inside your code, which is a big alternative and a advantage as I can tell.
khelll
@tobsen, GIL stands for Global Interpreter Lock.
khelll
+2  A: 

Haskell is thread-capable, in addition you get pure functional language - no side effects

Svetlozar Angelov
Does it support true multithreading?
psihodelia
So what? Being pure makes threading easy because you know functions can run in other threads and won't modify the environment -- Haskell also has Monads which are unpure section of the code, but moreover this doesn't answer the OP question at all. Please remove this answer.
Evan Carroll
+1  A: 

For real concurrency, you probably want to try Erlang.

Daniel Roseman
It seems that Erlanf also doesn't support true multithreading. From Wikipedia: Processes are the primary means to structure an Erlang application. Erlang processes are neither operating system processes nor operating system threads, but lightweight processes somewhat similar to Java's original “green threads”.
psihodelia
What does that Wikipedia quote have to do with true multithreading?
Jörg W Mittag
Don Stewart
+12  A: 

The GHC compiler will run your program on multiple OS threads (and thus multiple cores) if you compile with the -threaded option and then pass +RTS -N<x> -RTS at runtime, where <x> = the number of OS threads you want.

Ganesh Sittampalam
A: 

Haskell is suitable for anything. python has processing module, which (I think - not sure) helps to avoid GIL problems. (so it suitable for anything too).

But my opinion - best way you can do is to select highest level possible language with static type system for big and huge things. Today this languages are: ocaml, haskell, erlang.

If you want to develop small thing - python is good. But when things become bigger - all python benefits are eaten by miriads of tests.

I didn't use ruby. I still thinking that ruby is a toy language. (Or at least there's no reason to teach ruby when you know python - better to read SICP book).

Vasiliy Stavenko
Erlang is dynamically typed.
Jörg W Mittag
mm... my bad. bad sentence. I'd better to say about erlang in other sentence.
Vasiliy Stavenko
+1  A: 

I second the choice of Erlang. Erlang can support distributed highly concurrent programming out of the box. Does not matter whether you callit "multi-threading" or "multi-processing". Two important elements to consider are the level of concurrency and the fact that Erlang processes do not share state.

No shared state among processes is a good thing.

Richie
+22  A: 

1) Do Python, Ruby, or Haskell support true multithreading?

This has nothing to do with the language. It is a question of the hardware (if the machine only has 1 CPU, it is simply physically impossible to execute two instructions at the same time), the Operating System (again, if the OS doesn't support true multithreading, there is nothing you can do) and the language implementation / execution engine.

Unless the language specification explicitly forbids or enforces true multithreading, this has absolutely nothing whatsoever to do with the language.

All the languages that you mention, plus all the languages that have been mentioned in the answers so far, have multiple implementations, some of which support true multithreading, some don't, and some are built on top of other execution engines which might or might not support true multithreading.

Take Ruby, for example. Here are just some of its implementations and their threading models:

  • MRI: green threads, no true multithreading
  • YARV: OS threads, no true multithreading
  • Rubinius: OS threads, true multithreading
  • MacRuby: OS threads, true multithreading
  • JRuby, XRuby: JVM threads, depends on the JVM (if the JVM supports true multithreading, then JRuby/XRuby does, too, if the JVM doesn't, then there's nothing they can do about it)
  • IronRuby, Ruby.NET: just like JRuby, XRuby, but on the CLI instead of on the JVM

See also my answer to another similar question about Ruby. (Note that that answer is more than a year old, and some of it is no longer accurate. Rubinius, for example, uses truly concurrent native threads now, instead of truly concurrent green threads. Also, since then, several new Ruby implementations have emerged, such as BlueRuby, tinyrb, Ruby Go Lightly, Red Sun and SmallRuby.)

Similar for Python:

  • CPython: native threads, no true multithreading
  • PyPy: native threads, depends on the execution engine (PyPy can run natively, or on top of a JVM, or on top of a CLI, or on top of another Python execution engine. Whenever the underlying platform supports true multithreading, PyPy does, too.)
  • Unladen Swallow: native threads, currently no true multithreading, but fix is planned
  • Jython: JVM threads, see JRuby
  • IronPython: CLI threads, see IronRuby

For Haskell, at least the Glorious Glasgow Haskell Compiler supports true multithreading with native threads. I don't know about UHC, LHC, JHC, YHC, HUGS or all the others.

For Erlang, both BEAM and HiPE support true multithreading with green threads.

2) If a program contains threads, will a Virtual Machine automatically assign work to multiple cores (or to physical CPUs if there is more than 1 CPU on the mainboard)?

Again: this depends on the Virtual Machine, the Operating System and the hardware. Also, some of the implementations mentioned above, don't even have Virtual Machines.

Jörg W Mittag
The reason that CPython and YARV don't have true multithreading, even when run on a multicore processor, and even though they use native threads is because they have a global interpreter lock to prevent a thread from seeing the execution environment (e.g. the set of available class definitions and method calls) in an inconsistent state when some other thread tries to change it.
Ken Bloom
YARV calls it the Giant VM Lock (GVL), but in principle you're right. At least on YARV, there are plans to remove the GIL, though. If you look in `thread.c` you can see a description of the three different threading models YARV went through or is going to go through: green threads -> native threads with GIL -> truly parallel native threads with fine-grained locking. However, there's no code yet. In CPython, it's similar: Unladen Swallow plans to remove the GIL, and, if it's not too intrusive, to contribute this removal back to CPython.
Jörg W Mittag
+4  A: 

The Haskell implementation, GHC, supports multiple mechanisms for parallel execution on shared memory multicoore. These mechanisms are described in "Runtime Support for Multicore Haskell".

Concretely, the Haskell runtime divides work be N OS threads, distributed over the available compute cores. These N OS threads in turn run M lightweight Haskell threads (sometimes millions of them). In turn, each Haskell thread can take work for a spark queue (there may be billions of sparks).

The runtime schedules work to be executed on separate cores, migrates work, and load balances. The garbage collector is also a parallel one, using each core to collect part of the heap.

Unlike Python or Ruby, there's no global interpreter lock, so for that, and other reasons, GHC is particularly good on mulitcore in comparison, e.g. Haskell v Python on the multicore shootout

Don Stewart