views:

183

answers:

2

I’ve begun studying Erlang and find the BEAM runtime environment fascinating. It’s commonly stated that in Erlang, processes belong to the language rather than the OS (meaning the runtime, meaning BEAM in this case). These are the lightweight, “green processes” that Erlang is getting famous for. It’s further stated (on page 5 of this paper) that BEAM uses one (1) OS thread per CPU core for scheduling and another OS thread for i/o. So I wonder: From what thread do the CPU cycles needed to actually execute the Erlang code come from?

Further, if I’m running on a dual core machine I would expect -- based on what I’ve read so far -- to see three (3) threads running under the BEAM process: two schedulers (one for each core) and one i/o thread. But I see 10. Sometimes 11. Sometimes it starts at 13 and, like high-quality amplifiers, goes to 11.

I’m confused. Any insight will be appreciated.

+3  A: 

Erlang processes are not 'green' as threads are green in java. Erlang processes are structures, which do not share memory and they are maintained by Erlang VM.

It may sound strange but this paper could be 'old' (even though bio from 2007). It all changed around R13 release when we got brand new handling of run time queues (with dynamic balancing stuff and other goodies). Here is some presentation by Ulf Wiger about it http://ulf.wiger.net/weblog/2009/01/23/erlang-programming-for-multicore/

To sum up, processes are completely transparent and you may adjust number of run time queues and schedulers, but OS realization is not intact. I do not want to speculate why there are like 11 of threads..

EDIT: I am wrong about OS a bit:

"+S Schedulers:SchedulerOnline

Sets the amount of scheduler threads to create and scheduler threads to set online when SMP support has been enabled. Valid range for both values are 1-1024. If the Erlang runtime system is able to determine the amount of logical processors configured and logical processors available, Schedulers will default to logical processors configured, and SchedulersOnline will default to logical processors available; otherwise, the default values will be 1. Schedulers may be omitted if :SchedulerOnline is not and vice versa. The amount of schedulers online can be changed at run time via erlang:system_flag(schedulers_online, SchedulersOnline).

This flag will be ignored if the emulator doesn't have SMP support enabled (see the -smp flag).

"

from here: http://www.erlang.org/doc/man/erl.html

EDIT2: Interesting discussion on erlang-question mailing list on pros and cons of many VMs vs many schedulers. Unfortunately it is also from 2008 and may not be valid with huge improvements in new OTP releases. http://www.erlang.org/cgi-bin/ezmlm-cgi?4:mss:38165:200809:nbihpkepgjcfnffkoobf

@user425720 - Thanks for the response and those links. I think the bit about the number of schedulers being tied (by default anyway) to the number of cores is correct. But like you, I'm still unable to speculate about why there are 11 threads in my example of a dual-core machine. Still fascinating...
Alan
@Alan go ahead and ask question at erlang-questions mailing list. It is read constantly by OTP team and core VM devs!
@user425720 - Excellent idea. I didn't know about that list. This is the first time I've asked a question that stumped the Stack Overflow users. I'll report back with the results from the erlang-questions list.
Alan
+3  A: 

Following @user425720's advice, I asked my question on the erlang-questions LISTSERV. It's also available as a Google Group. Kresten Krab Thorup of Trifork answered me almost at once. My thanks to go out to Kreston. Here is his answer. (Parentheticals and emphasis are mine.)

Here is AFAIK, the basic scenario:

Erlang code will be run in as many "green threads" as there are processes; the process limit is controlled by the +P (command line) flag.

The green threads are mapped on to S threads, where S is the number of cores/CPUs. The fact that these threads are also called schedulers can seem somewhat confusing, but from the VMs point of view they are. From the developer's point of view, they are the threads that run your erlang code. The number S can be controlled with the +S option to the erl command line.

In addition hereto, there are a number of so-called "Async Threads". That's a thread pool which is used by I/O processes called linked in drivers, to react to select / poll etc. The number of asynch threads is dynamic, but limited by the +A flag.

So, the 11 threads you see on a dual-core may be 2 schedulers, and 9 async threads. For instance.

Read more about the flags here.

Alan