views:

659

answers:

4

I am currently using java and have read a lot about Erlang on the net and I have 2 big questions:

  1. How much slower (if any) will Erlang be over simple Java. I'm assuming here that Java is going to be faster from the shootout benchmarks on the net (Erlang doesn't do that well). i.e. How many more cpus am I going to need to make the Erlang shine over single threaded java (in my given situation (below))?

  2. After reading around Erlang for a while I've hit on a number of comments/posts that say that most large Erlang systems contain a good amount of C/C++. Is this for speed reasons (my assumption) or something else? i.e. Why is this required?

I have read about number of processors going up and threading models being hard (I agree) but I am looking to find out when the "line" is going to be crossed so that I can move language/paradigm at the right time.

A bit of background/context:
I am working on java services (server side) which are very cpu bound and easily made parallel. This is due to, typically, a single incoming update (tcp) triggering a change to multiple (100s) of outputs. The calculations are typically pretty simple (few loops, just lots of arithmetic) and the inputs are coming in pretty fast (100/s). Currently we are running on 4 cpu machines and running multiple services on each (so multi-threading is pretty pointless and the java seems to run faster without the sync blocks etc required to make it multi-threaded). There is now a strong push for speed and we now have access to 24 processor machines (per process if required) so I am wondering how best to proceed - massively multi-threaded Java or something easier to code like Erlang.

+3  A: 

Have you compared the cost of new hardware versus the cost of retraining staff in Erlang and re-architecting your software in a new language?

I wouldn't underestimate the expense of retraining yourself (or others) and the cost of hiring people conversant in Erlang (who are going to be a lot harder to find than Java people). Servers obviously cost in terms of their storage costs / power / maintenance etc., but they're still a lot cheaper than qualified staff. If you can make progress and remain scalable whilst using your current skillsets, I suspect that's the most pragmatic approach.

Brian Agnew
(+1) Firstly, Erlang is a complex piece of software, and using it to it's fullest requires much reading. Secondly the source-code can be VERY nasty to read -- i.e., for writing drivers and making changes to the IO sub-system.
Hassan Syed
Yes. I don't want the above to be read as a rant against Erlang. I think it looks fascinating. However there's an associated cost.
Brian Agnew
Interestingly, we tried the retraining in-house. We got a team of 4 up to (reasonable?) speed with Erlang within 3 weeks. Built a mock trading exchange system which seemed to work enough to prove the point. I personally think the retraining issue is FUD compared with getting java people that actually deeply understand multi-threaded programming and its pitfalls (of which I have met very few).
DaveC
I don't mean it to be FUD. But there *is* an associated cost.
Brian Agnew
I agree with DaveC as well, and Once you get rolling with Erlang.... well I believe it is the best thing since sliced bread. Java and C#, even though they have built in primitives and idioms for multi-programming -- it is less suited for it then C or C++ ( and you need guru's to pull it off correctly with those languages as well :/ ). Erlang takes all of that crap away with the new SMP beams -- and provided your needs fall within the libraries provided , or you have the developers to contribute what you are missing-- well you will see the 10x speedup that Armstrong raves about :D
Hassan Syed
I have to disagree to that Erlang is complex or it's source difficult to read. It's no more difficult than it is for a long time C programmer to look at Java for the first time. After 1 book, a few screencasts and 2-3 weeks I was already starting to contribute patches and changes to open source projects. And I had started doing some pretty complex stuff with socket programming connected to our Asterisk server. The point is that what you call "qualified staff" is really not qualified if they are unable to learn new things. One should not be a "Java programmer" but just a programmer.
Jon Gretar
I should clarify (in case *anyone* is in any doubt) that I'm not making assertions about Erlang's complexity or any other attributes - merely that there's an associated cost re. switching/retooling etc.
Brian Agnew
@Brian Agnew: I was not claiming that you were on a FUD mission of some sorts. :) I would however say that if these are huge concerns then you may possibly have a larger problem. It would mean that your company is static and unable to do minor changes to best solve the problems at hand. It should not be a costly thing nor difficult for a company to ask a few programmers to learn the basics of a different language. If it is a problem then you may have inadequate programmers simply said. What auto repair shop would hire a mechanic that can only ever work with the 1985 Dodge Ram?
Jon Gretar
@Jon - yes, but there *is* a cost associated with this (however minor). That's something you have to determine and factor in (regardless of whether it's the cost of an OReilly book, a learning course, hiring a tutor, or 3 man weeks of lost productivity).
Brian Agnew
+5  A: 

since this is a arithmetic heavy workload and you have already done the job of splitting out the code into seperate service processes, you wouldn't gain much from Erlang. Your job seems to fit Java comfortably. Erlang is good at tiny transactions -- such as msg switching or serving static or simple-dynamic web-pages. Not -- inately at enterprise number-crunching or database workload.

However, you could build on external numerical libraries and databases and use Erlang as a MSG switch :D that's what couch-db does :P

-- edit --

  1. If you move your arithmetic operations into an Erlang async-IO driver erlang will be just as good as the language shoot-out stuff -- but with 24 cpu's perhaps it won't matter that much; the erlang database is procedural and thefore quite fast -- this can be exploited in your application updating 100 entities on each transaction.

  2. The erlang runtime system needs to be a mix of C and C++ because (a) the erlang emulator is written in C/C++ (you have to start somewhere), (b) you have to talk to the kernel to do async file io and network io, and (c) certain parts of the system need to be blistering fast --e.g., the backend of the database system (amnesia).

-- discussion --

with 24 CPU's in a 6 core * 4 CPU topology using a shared memory buss -- you have 4 NUMA entities (the CPUs) and one central memory. You need to be wise about the paradigm, the shared-nothing multi-process approach might kill your memory buss.

To get around this you need to create 4 processes with 6 processing threads and bind each processing thread the corresponding core in the corresponding CPU. These 6 threads need to do collaborative multi-threading -- Erlang and Lua have this innately -- Erlang does it in a hard-core way as it has a full-blown scheduler as part of its runtime which it can use to create as many processes as you want.

Now if you were to partition your tasks across the 4 processes (1 per physical CPU) you would be a happy man, however you are running 4 Java VM's doing (presumably) serious work (yuck, for many reasons). The problem needs to be solved with a better ability to slice and dice the problem.

In comes the Erlang OTP system, it was designed for redundant robust networked systems, but now it is moving towards same-machine NUMA-esque CPU's. It already has a kick-ass SMP emulator, and it will become NUMA aware as well soon. With this paradigm of programming you have a much better chance to saturate your powerful servers without killing your bus.

Perhaps this discussion has been theoretical; however, when you get a 8x8 or 16x8 topology you will be ready for it as well. So my answer is when you have more then 2 -- modern -- physical CPU's on your mainboard you should probably consider a better programming paradigm.

As an example of a major product following the discussion here: Microsoft's SQL Server is CPU-Level NUMA-aware in the SQL-OS layer on which the database engine is built.

Hassan Syed
A: 

If you get 100 per second but they take 100s each how can it possibly keep up? Maybe I am misreading that part, but anyway unless it's thousands or millions of requests a second your synchronization code should not be taking long. If it is, you are doing something wrong, possibly locking while you execute the whole job or something.

For multithreaded code, going to an even higher level language is probably a mistake. Even if you write the application part in erlang or whatever the multithreading should probably be in Java or move to C++ if performance really becomes an issue.

Charles Eli Cheese
+1  A: 

The question of speed when it comes to programming languages is as complex as a question can get. Java advocates can point to a lot of areas and claim to be fastest and they would be 100% correct. Ruby/Python advocates point to a different set of parameters and claim to be faster and they would also be correct. Erlang advocates then point to concurrent connections and claim to be fastest when dealing with hundreds or thousands of concurrent connections or calculations and the would not be wrong either.

Looking at the basic description of the project in question it seems to me that Erlang would be a perfect fit for your needs. Not knowing the details I would say that this would actually be a pretty darn simple Erlang program and could be done in a very short time indeed.

Jon Gretar