views:

1030

answers:

2

I am working on a project using Hadoop and it seems to natively incorporate Java and provide streaming support for Python. Is there is a significant performance impact to choosing one over the other? I am early enough in the process where I can go either way if there is a significant performance difference one way or the other.

+5  A: 

Java is less dynamic than Python and more effort has been put into its VM, making it a faster language. Python is also held back by its Global Interpreter Lock, meaning it cannot push threads of a single process onto different core.

Whether this makes any significant difference depends on what you intend to do. I suspect both languages will work for you.

David Crawshaw
python does, however, have very nice multiprocess support for multiple cores
cobbal
I heard that the global interpreter lock made the multi-core support almost a wash, but I don't remember the exact place I heard this so take it with a grain of salt.
Bill K
The multi-core process stuff can be used with the parallel python module which also allows you to push processes to other machines in a cluster. Very neat and easy.
whatnick
+5  A: 

With Python you'll probably develop faster and with Java will definitely run faster.

Google "language shootout game" if you want to see some very accurate speed comparisons between all popular languages, but if I recall correctly you're talking about 3-5x faster.

That said, few things are processor bound these days, so if you feel like you'd develop better with Python, have at it!


In reply to comment (how can java be faster than Python):

All languages are processed differently. Java is about the fastest after C & C++ (which can be as fast or up to 5x faster than java, but seems to average around 2x faster). The rest are from 2-5+ times slower. Python is one of the faster ones after Java. I'm guessing that C# is about as fast as Java or maybe faster, but the shootout only had Mono (which was a tad slower) because they don't run it on windows.

Most of these claims are based on the computer language benchmark game (shootout) which tends to be pretty fair because advocates of/experts in each language tweak the test written in their specific language to ensure the code is well-targeted.

For example, this shows all tests with Java vs c++ and you can see the speed ranges from about equal to java being 3x slower (first column is between 1 and 3), and java uses much more memory!

Now this page shows java vs python (from the point of view of Python). So the speeds range from python being 2x slower than Java to 174x slower, python generally beats java in code size and memory usage though.

Another interesting point here--tests that allocated a lot of memory, Java actually performed significantly better than Python in memory size as well. I'm pretty sure java usually loses memory because of the overhead of the VM, but once that factors out, java is probably more efficient than most (again, except the C's).

This is Python 3 by the way, the other python platform tested (Just called Python) faired much worse.

If you really wanted to know how it is faster, the VM is amazingly intelligent. It compiles to machine language AFTER running the code, so it knows what the most likely code paths are and optimizes for them. Memory allocation is an art--really useful in an OO language. It can perform some amazing run-time optimizations which no non-VM language can do. It can run in a pretty small memory footprint when forced to, and is a language of choice for embedded devices along with C/C++.

I worked on a Signal Analyzer for Agilent (think expensive o-scope) where nearly the entire thing (aside from the sampling) was done in Java. This includes drawing the screen including the trace (AWT) and interacting with the controls.

Currently I'm working on a project for all future cable boxes. The Guide along with most other apps will be written in Java.

Why wouldn't it be faster than Python?

Bill K
how can Java be faster than Python? is there any excerpt on that? thanks.
jpartogi
Without taking away from your summary, keep in mind that more of the Java programs may have been converted to use quadcore - so also look at the one core measurements - http://shootout.alioth.debian.org/u32/index.php
igouy
Interesting. I had looked at the worst performing java program (the tree one) and noticed it wasn't multi-threaded, but you are right--many other languages make a surprising show in single-threaded mode. Free Pascal??? Ada??? Hmph
Bill K