views:

480

answers:

21

I'm currently participating in a programming contest (http://contest.github.com), which has as goal, to create a recommendation engine. I started coding in ruby, but soon realised it wasn't fast enough for the algorithms I had in mind. So I switched to C, which is the only non-scripting language I know. It was fast, of course, but I cringed every time I had to write a for loop, to go through the elements of an array (which was very often).

That's when it dawned: I wish I knew a fast, yet high-level language, to program all these intensive computations with ease!

So I looked at my options, but there are a lot of options these days! Here the best candidates I've found over the months, with something which bothers me about each of them (that hopefully you can clear up):

  • Clojure: I'm not sure I want to get into the whole lisp thing, I like my syntax and cruft. I could be convinced, though.
  • Haskell: Too academic? I don't really care for pure functional, I just want something which works. But it has nice syntax, and I don't mind static typing.
  • Scala: Weird language. I tried it out but it feels messy/inconsistent to me.
  • OCaml: Also wondering if this is too academic? The poor concurrency support also bothers me.
  • Arc: Paul Graham's lisp, too obscure, and again, I'm not sure I want to learn a lisp. But I trust this man!

Any advice? I really like the functional languages, for their ability to manipulate lists with ease, but I'm open to other options too. I'd like something about as fast as Java..

The kind of things I want to be able to do with lists are like (ruby):

([1, 2, 3, 4] - [2, 3]).map {|i| i * 2 } # which results in [2, 8]

I would also prefer an open-source language.

Thanks

+2  A: 

D might fit the bill? Compiles to machine code but allows for programming using higher-level concepts.

Andy Balaam
Ah, I forgot about this one! The only problem I would say is that it isn't particularly good at handling data-structures and doing list magic.
cloudhead
D does look cool - I like how it has more high-features than C# or Java.
Sam Schutte
+1, D sounds like it fits the bill most here.
Pavel Minaev
D has some quite nice support for functional programming and immutable state etc.
Andy Balaam
+6  A: 

Why not simple Java or C#? Should be faster then Ruby, more high level then C and have a huge userbase.

Jens Schauder
Java and C# are faster than Ruby, but still (IMO) much too slow for code which needs to run fast. Haskell, OCaml, and Common LISP are the only high-level languages I know which can match C.
John Millikin
I'm not a fan of Java, and I don't use windows, I should have mentioned.
cloudhead
The tests I've seen show C# having 98% of the speed of C++. I don't know about C though. Since you're not a windows user, you could try mono, though I don't know how it performs.
Sam Schutte
@John fast is extremely relative. Only a specific problem can show what kind of performance the OP needs. Maybe he need hand optimized assembler, although this might not qualify as high level.
Jens Schauder
Haskell definitely cannot match C performance-wise. If you've got that idea from the Language Shootout, then you should know that the methodology used to measure it there was extremely flawed (in practice, all performance gains were from misunderstanding the lazy deferred evaluation nature of Haskell expressions).
Pavel Minaev
C is not high level? News to me.
Hooked
I don't know in what universe it is that people think that Haskell, OCaml or Common Lisp(!) can match the performance of well written C with a production quality compiler and optimizer, but it's not this one. C is a mid-level (not high-level) language that can frequently match the speed of hand-written assembly code. The only other non-assembler languages that I have seen that could truly match it were FORTRAN and Bliss and good luck finding a current quality version of either of them.
RBarryYoung
@Jens - Assembly "might not" qualify as high-level?
Chris Lutz
+1  A: 

Python can be made to run fast, especially using the NumPy package. Relevant links below:

http://www.scipy.org/PerformancePython

http://stackoverflow.com/questions/1199972/cython-and-numpy-speed

klochner
Interesting, will look into it, but I doubt it will match even Java in speed.
cloudhead
igouy
+2  A: 

Haskell is my current preference as a performant, high-level language. I've also heard very good things about OCaml, but haven't personally used it much.

Scala and Clojure will have similar performance to Java -- slow, slow, slow! Sure, they'll be faster than Ruby, but what isn't?

Arc is a set of macros for MzScheme, and is not particularly fast. If you want a performant LISP, try Common LISP -- it can be compiled to machine code.

John Millikin
So Haskell really is that much faster than Java? Also, do you find that its academic nature can sometimes be a barrier to getting code out quickly?
cloudhead
In the benchmarks I've seen, Haskell's performance is roughly comparable to C. It depends on how the code's written -- there's no way to avoid the low-level bit twiddling required for high-performance code, but the rest of the code can be written to a higher level. I don't consider Haskell particularly "academic", but the library situation is not as healthy as Python.
John Millikin
Chuck
Haskell cannot reasonably be anywhere close to C, or, in fact, even Java, because of all the thunking it has to do for lazy evaluation.
Pavel Minaev
@Chuck: The 30% or so overhead typically seen moving to Haskell is nothing compared to the 200% or so overhead of C# or Java.
John Millikin
@Pavel: If that were the case, then Haskell would not regularly exceed Java's performance. Lazyness is optional, and many high-performance applications use strict evaluation in inner loops for better performance.
John Millikin
@John, where do you get your numbers from? There's no 200% overhead from either Java or C#; in fact, tight inner loops coded in either often get translated to _exact same asm opcodes_ as what a C/C++ compiler does.
Pavel Minaev
Also, laziness is optional in a sense that you don't have to make _your_ data structures lazy; but you'd also have to avoid virtually all types from the standard library as well, then. And what about perf hit from polymorphism? Again, you can avoid that if you code your own functions non-polymorphic, and don't use any polymorphic functions from the standard library - but at that point, what benefit do you get from using Haskell at all?
Pavel Minaev
cloudhead
@cloudhead: Python and Ruby are not interpreted. Both are compiled to bytecode and run in a virtual machine, like C# or Java. The difference in speed is that Python and Ruby are dynamically typed, which greatly reduces the opportunities for optimization available in a statically-typed language.
John Millikin
@John: Python/Ruby bytecode is still interpreted by the VM, so it's correct to call either a "bytecode interpreter". In contrast, Java/C# bytecode is JITted to native code, which then executes - there's no interpreter there (well, it's a bit more complex when Java is involved, but the end result is the same). The difference in performance because of this technique can easily reach 20x in favor of JIT (read the early UnrealScript papers by Tim Sweeney, he explores it there).
Pavel Minaev
@Pavel: that depends on which VMs you use. There are non-JIT VMs for Java and C#, and JIT'ing VMs for Python (and perhaps Ruby?). The difference between JIT and no JIT is not significant compared to the overhead of dynamic dispatch.
John Millikin
I'm not aware of any non-JIT VMs for C#. In any case, primary VMs for both Java (Sun JRE) and C# (.NET) are JITting. Overhead of dynamic dispatch - sorry, you're plain wrong. ObjC has dynamic dispatch, but it's a compiled language, and is still orders of magnitude faster than, say, Python. And I have already given the reference on the _20x_ speed difference between native (and thus JIT) and bytecode on the _same_ kind of language earlier.
Pavel Minaev
@John, where do you get your numbers from?
igouy
@Pavel: Mono can be run in non-JIT mode, and experiences better performance than Python. ObjC has dynamic dispatch for messages, which is very slow, but still uses static dispatch for procedures ala C.
John Millikin
@Pavel: Your claims of Haskell's slowness due to the thunking penalty don't ring true either. Can you link to any benchmarks that show this slowness? The Shootout has Haskell looking slightly slower than Java, and actually a bit faster than C#, and I don't believe I've seen a non-contrived example where it's as slow as you make it out to be.
Chuck
@John, again, there is no surprise that a statically typed language will run faster than dynamically typed language, with everything else equal. But JIT gives _another_ significant perf benefit. Meanwhile, ObjC message dispatch is still significantly faster than Python/Ruby dispatch. In any case, the starting point of this discussion was why Java/C# aren't really much slower than C++. Both C++ and Java/C# are statically typed, and can (and implementations do) use vtables for method dispatch, so there's no difference there. And JIT covers the rest of it.
Pavel Minaev
@Chuck, Language Shootout compares GHC to Mono. Mono is known as a fairly mediocre implementation of both JIT and GC, and is lagging severely behind both JVM and .NET. On the other hand, GHC is truly a state of the art compiler doing very aggressive optimizations to mitigate the high-level nature of the language. If you want to see a true comparison, then comparing GHC to Sun JVM or .NET would give it - and, as you say yourself, it shows that Haskell is still slower than Java on the whole.
Pavel Minaev
A: 

You might consider python; it supports writing modules in C or C++, so you can get it working in a high-level language, profile it, rework the algorithms, and if it still isn't fast enough, translate the hotspots to C or C++ for speed.

retracile
I can do all of this with ruby + C though, which kind of brings me back to my initial problem.
cloudhead
A: 

Consider Tcl, combined with C. Do the really compute-intensive stuff in C since that's what you know how to do, then use Tcl as the glue to combine the high level code with your C-based code.

I make this recommendation not because Tcl is necessarily the best language for the job (there really is no "best" for something like this) but because you'll learn a lot about the concept of combining the strengths of two different languages. It's an important technique that could serve you well in your career whether it's Tcl/C, Lua/C, Groovy/Java, Python/C, etc.

Bryan Oakley
again, I can use C with ruby, but what I'm looking for is the ability to write fast _and_ high-level code, on large datasets.
cloudhead
Ok, I don't see where you mentioned combining the two in your original question though. You said you tried Ruby and then you tried C, you didn't say you tried them together.
Bryan Oakley
+5  A: 

Your criticism of pretty much everything seems to be that it's "weird" or "too academic." But what does that mean? It's the sort of vague criticism that you can throw at any unfamiliar language that isn't totally mainstream (i.e., not C, C++, Objective-C, Java, Ruby, Python or PHP). There's nothing about all those languages that's inherently good for academia and bad for anything else. Try to break down your analysis a little further: Specifically, what is it that troubles you about those languages? You might find that your brain is just instinctively pushing away something unfamiliar. If that's the case, learning one of those languages might be a good way to expand your mind.

Alternatively: It sounds like you're looking for a functional language, so you might look at F#. It's a first-class CLR language created by Microsoft, so it doesn't carry any "academic" mental baggage, and it's very similar to OCaml.

Chuck
When I say 'too academic', I mean that it seems like the language wasn't designed to solve a problem, but rather as a learning tool, or an experiment, and thus lacks certain characteristics, which would make it useful for everyday use.It's a mystery to me why fast and high level languages like Haskell or lisp aren't more widespread.I really don't have anything against academia, I just worry that some of these languages might have little real-world use, and that's why we don't see many projects implemented in them.
cloudhead
A: 

Python with pyrex or psyco may be a better fit? Probably not as fast as C, but you can see some significant speedups from regular Python.

retracile
+4  A: 

newLISP is fast, small, integrates extremely easily with C, and it has quite a few statistical functions built-in.

Jeff Ober
Here here, newlisp is one of the smartest scripting languages out there
cbo
+1  A: 

You seem uncomfortable with any language that doesn't look like one you already use. That's going to limit you, so I'd suggest one you won't be comfortable with if you're interested in expanding your horizons. I'm not saying you'll want to continue with any particular language (I have a definite preference never to touch Tcl again), but you should try it sometime.

There are nice fast implementations of Common Lisp, and that's an easy language to write functional programs in. Besides, if you can get along with it, you'll find a lot of neat things you can do with it.

David Thornley
A: 

If you want something that's "about as fast as Java," the obvious solution is JRuby.

If you install Netbeans (use the download button under the Ruby column), JRuby is the default interpreter. It doesn't get much easier!

Sarah Mei
Making something run on a JVM doesn't make it "about as fast as Java", unfortunately.
Pavel Minaev
Depends on your definition of about, I suppose. I did find this interesting "microbenchmark" while poking around for JRuby info. I guess the OP is stuck with Java or C++! http://blog.dhananjaynene.com/2008/07/performance-comparison-c-java-python-ruby-jython-jruby-groovy
Sarah Mei
igouy
+1  A: 

Computation? Fortran. Beats the pants off of anything else.

xcramps
+3  A: 

How about Delphi / FreePascal? They're native code & fast. I do a lot of real-time graphics & processing with them. They dont require that you work 'low level', but you can if you need to. Plus you can embed assembler if needed for extra performance. FreePascal is cross platform if you want to stay off Windows.

GrandmasterB
Delphi isn't really noticeably higher-level than C++, it just has cleaner syntax and stricter type rules.
Pavel Minaev
A: 

If your problem is C's clunky loops, I'd suggest looking at Ada. It allows you to loop through a whole array with a simple statement like so:

for I in array_name'range loop
   --'// Code goes here
end loop;

For AI projects, I'd also suggest you look into using Clips, which is a freely-available inference engine.

T.E.D.
+1  A: 

If you don't mind .NET...

  1. F# - based on O'Caml, multiparadigm language with full access to .NET Framework. Included officially in .NET FW 4.0
  2. Nemerle - see F# and add to that a POWERFUL metaprogramming capabilities.
Ray
A: 

Rather than OCAML, you might consider F# -- it's source compatible with OCAML (or you can use a lighter weight syntax) and it supports actor-style concurrency with what it calls asynchronous workflows (which are really an almost-monad for applying asynchronous execution).

Not that -- as Scala shows -- you need to have actor style concurrency baked into the language, if you build it into a library. The rest is just syntactic sugar.

Steve Gilham
+1  A: 

C++ or alternatively C# and mono.

Honestly, to accomplish much in the world of software engineering, you will likely have to wrap your head around these languages you find distasteful. Java, C, C++, C#, etc. are likely to come up in a career that involves programming.

Looks like you've done some interesting work. I encourage you to push your technical skills harder. It will be worth the effort.

Alternatively, Python might be good, given your interests. You might find Smalltalk interesting, or even ATS.

For some ideas, look at the Language Shootout and analysis by Oscar Boykin. You have already discovered this, but comparing Ruby to C we see that Ruby is between 14 and 600 times slower (several tests are more than 100 times slower). He also points out that Python is faster than Ruby. The benchmarks for all languages is interesting.

Also interesting are benchmarks from Dan Corlan.

bill weaver
"The benchmarks for all languages is interesting." Please choose the up-to-date measurements instead of these!
igouy
Sorry. Didn't see that. Updated.
bill weaver
+7  A: 

Out of the languages that you've listed, neither Haskell nor Arc match your "fast" requirement - both are slower than Java. Your idea that Haskell is faster than Java and approaches C is most likely coming from one well-known flawed test that tried to measure performance by implementing sort. One thing that they've missed is that Haskell is lazy, and thus you need to use the results of the sort for it to actually perform that; and they measured performance simply by remembering current time, "calling" the sort function, and checking the time delta. C version of the test faithfully performed the sort, Haskell version simply returned a thunk for lazy evaluation which was never called.

In practice, there are a number of reasons why Haskell cannot be that fast even in theory. First, because of pervasive lazy evaluation, it often cannot pass around raw values, and has to generate thunks for expressions - the optimizer can trim down on those in trivial cases, but not for more complicated ones. Second, polymorphic Haskell functions are implemented as runtime-polymorphic, and not like C++ templates where every new type parameter instantiates a new version of code that is optimally compiled. Obviously, this necessitates extra boxing/unboxing. In the end, Haskell will struggle to beat any decent VM (such as HotSpot JVM, or CLR in .NET 2.0+), much less C/C++.

Now that's settled in, let's move on to the rest. Scala uses JVM as a backend, and thus is not going to be any faster than Java - and if you use higher-level abstractions, it will most likely be slower somewhat, but probably in the same ballpark. Clojure also runs on JVM, but it's also dynamically typed, and that carries an unavoidable performance penalty (I heard it does clever tricks to mitigate that to some extent, but some of it really is unavoidable no matter what).

That leaves OCaml, and out of your list, it is the only language that had actually been conclusively shown to reach the performance of C/C++ compilers on valid tests. It should be noted however that this would not be typical of idiomatic OCaml code - for example, its polymorphism is also runtime, similar to Haskell, and that carries the appropriate penalty; also, its OOP system is structural rather than nominal, which precludes an optimal vtable-based implementation; so that is going to be slower than C++, too (I'd expect perf penalty close to that of Objective-C dispatch compared to C++ dispatch, but I don't have any numbers to back that up). So you can beat C++ in OCaml if you steer away from certain language features, but unfortunately, it's those features that make OCaml so attractive in the first place.

My advice would be this: if you really need speed, go with C++. It can be fairly high-level if you use high-level libraries such as STL and Boost. It doesn't have some high-level language abstractions you might be used to, but libraries can compensate for that - sometimes fully, sometimes in part. For example, you don't have to write a for-loop to iterate over an array - you can use std::for_each, std::copy_if, std::transform, std::accumulate and similar algorithms (which are mostly analogous to map, filter, fold, and similar traditional FP primitives), and also Boost.Lambda to cut down on boilerplace.

Pavel Minaev
+1: thank goodness, there's are least one sane person with an answer grounded in reality.
RBarryYoung
timday
Boost.Lambda? Are you kidding? C++ is already bloated, adding lambda with awful syntax does not solve it's problems. Face it - C++ time IS over. For high performance there is pure C and for applied programming there are lots of great high-level languages, in which you don't have to think about shooting yourself in foot.
freiksenet
C++ is at least that much better than plain C as RAII is better than manual resource management, even if you completely ignore the rest of the language. Also, for a language which time is over, it sure is strange that vast majority of all desktop software is still written in C++ (not C, and definitely not anything else).
Pavel Minaev
People and especially managers are very slow to adapt new technology. Java is already more popular that C++ and I believe that C# will soon be more popular too and then with Linux and webapp growth of popularity also Python.
freiksenet
A: 

Learn C++ and familiarize yourself with its standard library. It won't be that hard to learn as you already 'speak' C, but keep in mind that C++ is not just a better C, it's another language with its own concepts and methods.

milan1612
+1  A: 

After your update:

If you want to manipulate lists easily you should go with Common Lisp. It is only 2 times slower that C in average (and actually faster in some things), it is great for list processing and it is multi-paradigm (imperative, functional and OO) - so you don't have to stick to functional-only programming. SBCL is a good Common Lisp to try first, IMO.

And don't get bothered by strange "lispy" things like parentheses. It is not only quite stupid to judge language by its syntax, rather than semantics, but also parentheses are one of the greatest strengths of LISP, because they eliminate differences between data and expressions and you can manipulate language itself to make it fit your needs.

Don't listen to people who advice C++/C#/Java. Java functional part is non-existant. C++ functional part is terrible. C# delegates makes me sick because of their complexity. They are not REAL multi-paradigm imperative/functional languages, they are imperative/OO languages that have some small functional bits, you can't do real functional programming in them.

freiksenet
I have no idea why people are recommending me C++/C#/Java as I'm very aware of these languages, and also very aware of their inability to do list comprehensions, functional programming etc.Would you advise Common Lisp over Clojure? I know it's faster, but I'm not going to be doing OOP with it anyway, and I don't think the speed difference is that big, whereas Clojure comes with many advantages like concurrency orientation.
cloudhead
I can't say if it is slower than Common Lisp, in average ideal situation it should be faster, because it uses JVM and Java on it is faster than Common Lisp. I would advice Common Lisp over Clojure if you need "general-purpose" language. But ff you need powerful functional language with focus on concurrency - then choose Clojure.
freiksenet
A: 

Why not Erlang?

  • It's not too much like the languages you already know, so you can learn new concepts
  • It has some interesting capabilities for multiprocessing
  • It's not out of academia. Erlang was a commercial language first.
  • There are at least two significant open source applications written in it: CouchDB and Wings3d
Theran
That's a language I definitely would like to learn, but I'm not sure it's the best for 'general purpose' programming.
cloudhead