views:

363

answers:

3

Hi all,

In a test I'm building here my goal is to create a parser. So I've built a concept proof that reads all messages from a file, and after pushing all of them to memory I'm spawning one process to parse each message. Until that, everything is fine, and I've got some nice results. But I could see that the erlang VM is not using all my processor power (I have a quad core), in fact it is using about 25% percent of my processor when doing my test. I've made a counter-test using c++ that uses four threads and obviously it is using 100% thus producing a better result (I've respected the same queue model erlang uses).

So I'm wondering what could be "slowing" my erlang test? I know it's not a serialization matter as I'm spawning one process per message. One thing I've thought is that maybe my message is too small (about 10k each), and so making that much of processes is not helping achieve a great performance.

Some facts about the test:

106k messages On erlang (25% processor power used) - 204 msecs On my C++ test (100% processor power used) - 80 msecs

Yes the difference isn't that great but if there is more power available certainly there is more room for improvement, right?

Ah, I've done some profilling and wasn't able to find another way to optimize, since there are few function calls and most of them are string to object convertion.

Update:

Woooow! Following Hassan Syed idea, I've managed to achieve 35 msecs against 80 from c++! This is awesome!

+1  A: 

If you have once source file and you spawn one process per "expression" you really do not understand when to parallelise. It costs FAR more to spawn and process and process an expression than to just have one process to process an entire file. A suitable strategy would be to have one process per file rather than one process per expression.

Another alternative strategy would be to split the file in two,three or x chunks, and process those chunks. This of course assumes the source isn't linearly dependant and the chunks' processing time needs to exceed the time to create and spawn a process (ussualy by far, because time waste in process X is time taken away from the rest of the machine).

-- Discussion C++ vs Erlang and your findings --

Erlang has a user-space kernel that emulates a lot of the primitives of the OS kernel. Especially the scheduler and blocking primitives. This means that there is some overhead when comparing the same strategy used in a procedural raw language such as C++. You must tune your task partitioning to every entry from the implementation space (CPU/memory/OS/programming language) according to its properties.

Hassan Syed
@Hassan Yes, I understood what you've said. I still think that spawning as much processes as needed is the way to go. What I can do is do some parsing in batch like you said, so the cost of process creation/destruction can be normalized by the time it takes to parse. I'll test later on today and will update the results.And about the c++ difference, yes, also, I know that. I just made a counter-test to have a low margin on what speed I should expect, I'm not trying to achieve the same since the power of erlang is not on parsing itself.
scooterman
+3  A: 

It seems your erlang VM is using only one core.

Try starting it like this:

erl -smp enable +S 4

The -smp enable flag tells Erlang to start the runtime system with SMP support enabled With +S 4 you start 4 Erlang schedulers (1 for each core)

You can see if you have SMP enabled when you start the shell:

Erlang R13B01 (erts-5.7.2) [source] [64-bit] [smp:2:2] [rq:2] [async-threads:0] [kernel-poll:false]

Eshell V5.7.2  (abort with ^G)
1> 

[smp:2:2] tells it is running with smp enabled 2 schedulers 2 schesulers online

filippo
(+1) thats a probably the problem :P. However modern beams start up with SMP enabled no ?
Hassan Syed
Yes latest version automatically start SMP, but the 25% cpu usage seems to indicate only 1 core is used, so probably scooterman is using an older version or it might be compiled without SMP.
filippo
hmmm interesting that on windows apparently SMP was disabled, and I'm using a not-so-old (R13B1) version. Although I have more processor usage now, the parsing speed hasn't changed at all. :( Thanks nonetheless.
scooterman
+2  A: 

You should bind the schedulers to the CPU cores:

erlang:system_flag(scheduler_bind_type, processor_spread).
Zed