tags:

views:

3704

answers:

7

I am wondering if I should continue to learn OCaml or switch to F# or Haskell.

Here are the criteria I am most interested in:

  • Longevity

    • Which language will last longer? I don't want to learn something that might be abandoned in a couple years by users and developers.
    • Will Inria, Microsoft, University of Glasgow continue to support their respective compilers for the long run?
  • Practicality

    • Articles like this make me afraid to use Haskell. A hash table is the best structure for fast retrieval. Haskell proponents in there suggest using Data.Map which is a binary tree.
    • I don't like being tied to a bulky .NET framework unless the benefits are large.
    • I want to be able to develop more than just parsers and math programs.
  • Well Designed

    • I like my languages to be consistent.

Please support your opinion with logical arguments and citations from articles. Thank you.

+3  A: 

F# and OCaml are very similar in syntax, though, obviously, F# is better with .NET.

Which one you learn or use should be dependent on which platform you are aiming for.

In VS2010 F# is going to be included, and since it compiles to .NET bytecode, it can be used on a windows OS that supports the .NET version you used for it. This will give you a large area, but there are limits, currently with F# that OCaml don't have, in that F# appears not to take advantage of all the processors on a machine, but, that is probably due to F# still being developed, and this may be a feature that isn't as important yet.

There are other functional languages, such as Erlang that you could look at, but, basically, if you are strong in one FP language then you should be able to pick up another fairly quickly, so, just pick one that you like and try to develop interesting and challenging applications in it.

Eventually language writers will find a way to get OO languages to work well with multi-cores and FP may fall to the wayside again, but, that doesn't appear to be happening anytime soon.

James Black
F# has asynchronous workflows, which is admittedly pretty nice for doing parallelisation. Nonetheless, with Parallel Extensions appearing in the .NET Framework 4.0, I would argue that realtively easy OOP on multiple cores is just around the corner.
Noldorin
Not sure what you mean by F# not using all the processors on a machine. I have no trouble getting F# code to load to 100%. Depends on what I'm doing.
sblom
There was an article I read a few months ago where the author mentioned that unlike OCaml, F# didn't properly use all eight of his processors. My laptop doesn't have that, so I can't verify, but I expect it was due to the version that was being used, and I figured it would be fixed when it was a high enough priority.
James Black
the Haskell people have just released some stunning results on parallel performance with 8 cores. See 'Runtime Support for Multicore Haskell' at http://www.haskell.org/~simonmar/bib/bib.html
Norman Ramsey
@sblom: I think James is referring to compilation. OCaml compiles in parallel but F# does not.
Jon Harrop
As compiler writers get better at using multi-core cpus then I would expect that there should be some improvement, but, I also expect that people will start to take lessons from FP to try to get a speed up. If, in C# or Java, I don't use any global variables, for example, then there is no reason why the compiler couldn't take advantage of that to help parallelize the application.
James Black
+5  A: 

There's no simple answer to that question, but here are some things to consider:

Haskell and OCaml are both mature languages with strong implementations. Actually, there are multiple good implementations of Haskell, but I don't think that's a major point in its favor for your purpose.

F# is much younger, and who can predict where Microsoft will decide to take it? How you feel about that depends more on how you feel about Microsoft than anything anyone can tell you about programming languages.

OCaml (or ML in general), is a good practical language choice that supports doing cool functional stuff without forcing you to work in a way that might be uncomfortable. You get the full benefit of things like algebraic data types, pattern matching, type inference, and everybody else's favorite stuff. Oh, and objects.

Haskell gives you all that (except objects, pretty much), but also more or less forces you to rethink everything you think you know about programming. This might be a very good thing, if you're looking to learn something new, but it might be more than you want to bite off. I say this as someone who is only maybe halfway along the path to being a productive, happy Haskell programmer.

Both OCaml and Haskell are being used to write lots of different kinds of programs, not just compilers and AI or whatever. Google is your friend.

One last note: OCaml gives you hashtable, but it's hardly sensible to use it in code if you really want to embrace functional programming. Persistent trees (like Data.Map) are really the right solution for Haskell, and have lots of nice properties, which is one of the cool things to learn about when you pick up Haskell.

Moss Prescott
But the problem is that I would rather have O(1) access. O(log n) access for a large set is ridiculous. It doesn't matter if you have a persistent tree, because you can sure have a persistent hash table.
Unknown
If you just want O(1) because it sounds good, then as another poster said feel free to use a hashtable in Haskell, even though it's not a great fit. My point was that a language with lazy evaluation and no side-effects really begs for persistent data structures (http://en.wikipedia.org/wiki/Persistent_data_structure). This is a powerful idea and worth some study and reflection.In my opinion this is more important than O(1) vs. O(log n), and it's certainly more important than somebody's random timing exercise. Say it with me: "micro-benchmarks are evil!"
Moss Prescott
@Moss, unfortunately, I have run into plenty of situations where the difference between O(k) and O(log n) where k is from 1-20 and n is in the hundreds of thousands. I quite seriously doubt that "persistent data structure" is a legitimate excuse for not having a decent hash table. If you can have persistent trees, why not have persistent hash tables?
Unknown
Try using IntMap (or the hashtable libraries on hackage -- there's a judy arrays binding), and report problems if you encounter them. Probably you won't have issues, is my guess.
Don Stewart
Moss Prescott
@Unknown: the problem is not so much the difference in asymptotic algorithmic complexity but the cache coherence of the data structures. The trees require many indirections and, consequently, are over an order of magnitude (10x) slower for most practical applications. Maybe you will never care but at least OCaml and F# give you the choice...
Jon Harrop
+12  A: 

Longevity

No one can predict the future, but

  • OCaml and Haskell have been surving well for a number of years, which bodes well for their future
  • when F# ships with VS2010, MS will have legal obligations to support it for at least 5 years

Practicality

Perf: I don't have enough first-hand experience with Haskell, but based on second-hand and third-hand info, I think OCaml or F# are more pragmatic, in the sense that I think it is unlikely you'll be able to get the same run-time perf in Haskell that you do in OCaml of F#.

Libraries: Easy access to the .Net Framework is a huge benefit of F#. You can view it as being "tied to this bulky thing" if you like, but don't forget that "you have access to a huge bulky library of often incredibly useful stuff". The 'connectivity' to .Net is one of the big selling points for F#. F# is younger and so has fewer third-party libraries, but there is already e.g. FsCheck, FParsec, Fake, and a bunch of others, in addition to the libraries "in the box" on .Net.

Tooling: I don't have enough personal experience to compare, but I think the VS integration with F# is superior to anything you'll find for OCaml/Haskell today (and F# will continue to improve a bit here over the next year).

Change: F# is still changing as it approaches its first supported release in VS2010, so there are some breaking changes to language/library you may have to endure in the near future.

Well Designed

Haskell is definitely beautiful and consistent. I don't know enough OCaml but my hunch is it is similarly attractive. I think that F# is 'bigger' than either of those, which means more dusty corners and inconsistencies (largely as a result of mediating the impedence mismatch between FP and .Net), but overall F# still feels 'clean' to me, and the inconsistencies that do exist are at least well-reasoned/intentioned.

Overall

In my opinion you will be in 'good shape' knowing any of these three languages well. If you know a big long-term project you want to use it for, one may stand out, but I think many of the skills will be transferable (more easily between F# and OCaml than to/from Haskell, but also more easily among any of these three than with, say, Java).

Brian
Well the thing about being tied to the big bulky .NET, is that it forces you to bundle with it on purpose, instead of only selectively including the libraries you use.
Unknown
+42  A: 

Longevity

  • Haskell is de facto the dominant language of functional-programming research. Haskell 98 will last for many more years in stable form, and something called Haskell may last 10 to 30 years---although the language will continue to evolve. The community has a major investment in Haskell and even if the main GHC developers are hit by a bus tomorrow (the famous "bus error in Cambridge" problem), there are plenty of others who can step up to the plate. There are also other, less elaborate compilers.

  • Caml is controlled by a small group at INRIA, the French national laboratory. They also have a significant investment, Others are also invested in Caml, and the code is open source, and the compiler is not too complicated, so that too will be maintained for a long time. I predict Caml will be much more stable than Haskell, as the INRIA folks appear no longer to be using it as a vehicle for exploring new language ideas (or at least they are doing so at a smaller rate than in the past).

  • Who knows what a company will do? If F# is successful, Microsoft could support it for 20 years. If it is not successful, they could pull the plug in 2012. I can't guess and won't try.

Practicality

A hash table is the best structure for fast retrieval. Haskell proponents in there suggest using Data.Map which is a binary tree.

It depends on what you are searching. When your keys are strings, ternary search trees are often faster than hash tables. When your keys are integers, Okasaki and Gill's binary Patricia trees are competitive with hashing. If you really want to, you can build a hash table in Haskell using the IO monad, but it's rare to need to.

I think there will always be a performance penalty for lazy evaluation. But "practical" is not the same as "as fast as possible". The following are true about performance:

  • It is easiest to predict the time and space behavior of a Caml program.

  • F# is in the middle (who really knows what .NET and the JIT will do?).

  • It is hardest to predict the time and space behavior of Haskell programs.

  • Haskell has the best profiling tools, and in the long run, this is what yields the best performance.

I want to be able to develop more than just parsers and math programs.

For an idea of the range of what's possible in Haskell, check out the xmonad window manager and the vast array ofpackages at hackage.haskell.org.

I don't like being tied to a bulky .NET framework unless the benefits are large.

I can't comment:

Well Designed

I like my languages to be consistent.

Some points on which to evaluate consistency:

  • Haskell's concrete syntax is extremely well designed; I'm continually impressed at the good job done by the Haskell committee. OCaml syntax is OK but suffers by comparison. F# started from Caml core syntax and has many similarities.

  • Haskell and OCaml both have very consistent stories about operator overloading. Haskell has a consistent and powerful mechanism you can extend yourself. OCaml has no overloading of any kind.

  • OCaml has the simplest type system, especially if you don't write objects and functors (which many Caml programmers don't, although it seems crazy to me not to write functors if you're writing ML). Haskell's type system is ambitious and powerful, but it is continually being improved, which means there is some inconsistency as a result of history. F# essentially uses the .NET type system, plus ML-like Hindley-Milner polymorphism (See question "What is Hindley-Milner".)

  • OCaml is not quite consistent on whether it thinks variants should be statically typed or dynamically typed, so it provides both ("algebraic data types" and "polymorphic variants"). The resulting language has a lot of expressive power, which is great for experts, but which construct to use is not always obvious to the amateur.

  • OCaml's order of evaluation is officially undefined, which is a poor design choice in a language with side effects. Worse, the implementations are inconsistent: the bytecoded virtual machine uses one order and the native-code compiler uses the other.

Norman Ramsey
+1 Very interesting points.
Unknown
@Norman, I've been reading about your comments that a ternary search is often faster than hashing. The worst case performance is O(log s + k) which is dependent on the number of strings and the Key length. Now I see that the only way this could be faster is if you often test for elements that are not present in the trie, which means that it will probably take less than k operations. However, if you are frequently access keys that exist, the hash should be faster because it is O(k). What do you say about this?
Unknown
Some good points but: your advice about the performance of trees vs hash tables is misleading. Your assertion that Haskell has better profiling support than .NET is absurd. Your assertion that many OCaml programmers ignore objects and functors is wrong (you cannot even use Set and Map otherwise and several of OCaml's most popular libraries rely heavily upon objects) and you've neglected polymorphic variants, which are one of OCaml's greatest advantages over other statically-typed FPLs. Your assertion about OCaml's evaluation order being "inconsistent" is wrong: it has always been undefined.
Jon Harrop
For example, (in OCaml) adding 1,000,000 ints to a patricia tree is 6x slower than a hash table if you start from empty and 18x slower than a hash table if you reserve space for the number of elements before adding them.
Jon Harrop
I've started learning a little more about Haskell, and found out that the syntax is __so_much_better__ than OCaml. But I still don't like lazy evaulation by default, and the hash table stuff still bothers me. So as a result, I am still torn between these languages.
Unknown
@Unknown: I'm rleying on Bentley and Sedgewick's results from the web site I pointed to. I did replicate these results in the late 1990s I think, but of course those results may no longer obtain on today's hardware.
Norman Ramsey
@Jon Harrop (Trees vs hash tables): as noted, I'm referring to other people's results, which I had replicated once upon a time.
Norman Ramsey
@Jon Harrop (profiling support in GHC and .NET): what does .NET offer that is as good as GHC's cost centers and GHC's heap profiler?
Norman Ramsey
@Jon Harrop (do many OCaml programmers ignore objects and functors?): while I myself love functors and hate objects, as you'll learn if you read my ML papers, when I attend the International Conference on Functional Programming I routinely hear comments like "I never saw an object or a functor I liked".
Norman Ramsey
@Jon (Patricia tree): are you using the Okasaki and Gill Patricia tree or some generic Patricia tree?
Norman Ramsey
@Jon Harrop (other criticisms): you've made some good points and I've updated my answer accordingly.
Norman Ramsey
@Unknown: It took me about 5 years to stop worrying and learn to like lazy evlauation by default, and it wasn't until I got a few big wins in large programs where I hadn't planned in advance for structures to be lazy. So I sympathize. For the hash table I think you'll be hard pressed to find an app where it matters.
Norman Ramsey
+11  A: 

This wasn't one of your criteria but have you considered job availability? Haskell currently list 144 jobs on indeed, Ocaml list 12 and C# list 26,000. These numbers are not perfect but I bet you that once F# ships it won't be long before it blows past Haskell and Ocaml in the number of job listings.

So far every programming language included in Visual Studios has thousands of job listings for it. Seems to me that if you want the best chance to use a functional programming language as your day job then F# will soon be it.

gradbot
From a practical perspective, this about nails it.
Darren Oster
I'd recommend looking at the jobs themselves. The last time I checked, many non-Haskell jobs were using Haskell as a buzzword because of the current fad. I personally prefer to read about the success stories built upon those languages. Haskell is way way behind in that respect, of course.
Jon Harrop
I'm surprised I have 2 down votes and 3 up votes for this. I thought the point of down voting was to remove irrelevant or wrong information not to disagree.
gradbot
The downvotes are coming from the "Microsoft is an evil empire" crowd.
Robert Harvey
According to itjobswatch.co.uk, F# already overtook Haskell and is growing much faster even though it won't be properly released until 22nd March 2010!
Jon Harrop
+11  A: 

Should you learn F# or Haskell if you know OCaml?

I believe the answer is certainly yes, ideally you should learn all three languages because each one has something to offer but F# is the only one with a significant future so, if you can only feasibly learn one language, learn F# by reading my Visual F# 2010 for Technical Computing book or subscribing to our The F#.NET Journal.

Longevity

Microsoft committed to supporting F# when they released it as part of Visual Studio 2010 in April. So F# is guaranteed a rosy future for at least a few years. With a powerful combination of practically-important features like a high performance native-code REPL, high-level constructs for parallelism built-in to .NET 4 and a production-quality IDE mode, F# is a long way ahead of any other functional programming language in terms of real world applicability now. Frankly, nobody is even working on anything that might be able to compete with F# in the near future. My own open source HLVM project is an attempt to do so but it is far from ready.

In contrast, both OCaml and Haskell are being developed in extremely unproductive directions. This has been killing OCaml for several years now and I expect Haskell to follow suit over the next few years. Most former professional OCaml and Haskell programmers already moved on to F# (e.g. Credit Suisse, Flying Frog Consultancy) and most of the rest will doubtless migrate to more practical alternatives such as Clojure and Scala in the near future.

Specifically, OCaml's QPL license prevents anyone else from fixing its growing number of fundamental design flaws (poor 32-bit support, very poor shared-memory parallelism, no value types, slow parametric polymorphism, interpreted REPL, cumbersome FFI etc.) because they must distribute derivative works only in the form of patches to the original and the Debian package maintainers refuse to acknowledge an alternative upstream. The new features being added to the language, such as first-class modules in OCaml 3.12, are nowhere near as valuable as multicore capability would have been.

Some projects were started in an attempt to save OCaml but they proved to be too little too late. The parallel GC is practically useless and David Teller quit the batteries included project (although it has been picked up and released in a cut-down form). Consequently, OCaml has gone from being the most popular functional language in 2007 to severe decline today, with caml-list traffic down over 50% since 2007.

Haskell has fewer industrial users than OCaml and, although it does have multicore support, it is still being developed in a very unproductive direction. Haskell is developed almost entirely by two people at Microsoft Research in Cambridge (UK). Despite the fact that purely functional programming is bad for performance by design, they are continuing to try to develop solutions for parallel Haskell aimed at multicores when the massive amounts of unnecessary copying it incurs hits the memory wall and destroys any hope of scalable parallelism on a multicore.

The only major user of Haskell in industry is Galois with around 30 full-time Haskell programmers. I doubt they will let Haskell die completely but that does not mean they will develop it into a more generally-useful language.

Practicality

I wrote the article you cited about hash tables. They are a good data structure. Other people have referred to purely functional alternatives like ternary trees and Patricia trees but these are usually ~10× slower than hash tables in practice. The reason is simply that cache misses dominate performance concerns today and trees incur an extra O(log n) pointer indirections.

My personal preference is for optional laziness and optional purity because both are generally counter productive in the real world (e.g. laziness makes performance and memory consumption wildly unpredictable and purity severely degrades average-case performance and makes interoperability a nightmare). I am one of the only people earning a living entirely from functional programming through my own company. Suffice to say, if I thought Haskell were viable I would have diversified into it years ago but I keep choosing not to because I do not believe it is commercially viable.

You said "I don't like being tied to a bulky .NET framework unless the benefits are large". The benefits are huge. You get a production-quality IDE, a production-quality JIT compiler that performs hugely-effective optimizations like type-specializing generics, production-quality libraries for everything from GUI programming (see Game of Life in 32 lines of F#) to number crunching. But the real benefit of .NET, at least for me, is that you can sell the libraries that you write in F# and earn lots of money. Nobody has ever succeeded selling libraries to OCaml and Haskell programmers (and I am one of the few people to have tried) but F# libraries already sell in significant quantities. So the bulky .NET framework is well worth it if you want to earn a living by writing software.

Well designed

These languages are all well designed but for different purposes. OCaml is specifically designed for writing theorem provers and Haskell is specifically designed for researching Haskell. F# was designed to address all of the most serious practical problems with OCaml and Haskell such as poor interoperability, lack of concurrent garbage collection and lack of mature modern libraries like WPF in order to bring a productive modern language to a large audience.

Jon Harrop
Hi Jon thanks for the input. However, realistically, I won't be able to learn all 3 in a reasonable amount of time. I stumbled upon some of your posts on Reddit, Ltu and Lisp, and while you do seem a like a salesman for F# and OCaml, I can't help but agree with your conclusions about the hash table. It was also interesting how everyone else dismissed it and suggested tries. To try to get more input on ternary tries vs hash tables, I created this question http://stackoverflow.com/questions/823744/ternary-tree-vs-hash-table/
Unknown
Well, I came from OCaml and started learning both F# and Haskell at the same time. I quickly ditched Haskell not just because the language and its implementation are riddled with bugs and serious practical limitations but also because I did not like the community. They refuse to accept responsibility for problems like the hash table thing and the mailing list is full of nonsense like Don Steward trying to pretend that IRC chat is more important than generating real software, and someone else telling me that something I had done under contract for Wolfram Research was impossible (!).
Jon Harrop
Voting up because I don't believe this answer deserves (uncommented) downvotes
Benjol
Voting down because Jon changed what used to be a reasonable answer.
Reid Barton
Reid is right about this. Examine the history of this post and you'll see a sharp about-face from an even-handed approach to a combination of boosterism and FUD. True, F# has a strong built-in base being a .NET language, but the original version of this post had it right -- there's no evidence that OCaml or Haskell, both of which have growing userbases and popularity, are going anywhere.
sclv
@sclv: caml-list traffic, Google trends, the job market, the book market and our own sales are all strong indicators that OCaml is dying.
Jon Harrop
@sclv: OCaml has now declined so much that we have pulled out of it: http://flyingfrogblog.blogspot.com/2010/08/rise-and-fall-of-ocaml.html http://flyingfrogblog.blogspot.com/2010/08/more-ocaml-trends.html
Jon Harrop
Voting down. Amongst other inaccuracies, I don't know of any evidence for these remarks: ".. the massive amounts of unnecessary copying it incurs hits the memory wall and destroys any hope of scalable parallelism on a multicore." Garbage collectors can be built that perform very well on multicores, and it's been known how to do so for some time (see e.g. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.52.9494). The fact is, to take advantage of designs like this you need a majority of immutable data, i.e. a functional programming language.
Simon Marlow
@Simon: As I'm sure you know, the paper you cited never resulted in a working garbage collector (http://caml.inria.fr/pub/ml-archives/caml-list/2002/11/64c14acb90cb14bedb2cacb73338fb15.en.html). Indeed, even your own garbage collector in GHC copies their basic architecture but you re-introduced what they called the "clearly inadequate" "naive stop-the-world approach" that, as I said, destroys scalability. But the problem does not stop there, even the industrial strength GC in .NET 4 does not scale *at all*.
Jon Harrop
@Jon you missed the point. The paper describes a GC architecture that does local per-CPU GC in the young generation, which is an effective design for multicore. A variation of this design has been shown to work in Manticore. I didn't say we did it in GHC, though it would be a logical next step for us (we don't "copy their basic architecture", have you read the papers?). Note how immutability becomes really important for doing local GC, much more so than in generational GC: it's my guess this is why the "industrial strength" GCs in .NET and JVMs haven't implemented it.
Simon Marlow
@Simon: Why do you believe that the Doligez-Leroy GC architecture is "an effective design for multicore"? Manticore also uses a stop-the-world variant and its authors have also described its performance as "poor" (http://www.cse.unsw.edu.au/~pls/damp09/damp09-reppy-keynote.pdf). By "copy their basic architecture" I meant using local young generations (figure 2 in their paper) which is what you use in GHC, right?
Jon Harrop
@Jon The key idea is having local heaps that can be collected independently, and a shared global heap. This is what Manticore uses, I think you must have misinterpreted the comments in that paper, I've talked to the authors and they're pleased with how this design has worked out. In GHC we have local nurseries, but beyond that there's not much similarity - not being able to collect them independently is a big difference, so I wouldn't say the basic architecture is the same.
Simon Marlow
@Simon: There are certainly subtle differences in the synchronization used in each of the designs beyond the basic architecture but the fact remains that none of these constitute an "effective design for multicore" based upon the results obtained with these projects.
Jon Harrop
@Jon I'm baffled as to why you would even question this. Surely using local heaps and not doing stop-the-world is an effective way to improve scaling? You yourself have attacked us for doing stop-the-world in the past and in this very thread (which is silly incedentally, we've always admitted that stop-the-world is a temporary compromise, see our ICFP'09 paper, and blog posts on this). It's pretty clear that concurrent GC isn't going to fix the stop-the-world problem: think of the cache effects, for instance. Concurrent GC is for pause-time reduction, not scaling.
Simon Marlow
@Simon: Redundantly computing the mandelbrot set is also an effective way to improve scaling. The problem is that scaling is only half of the story. The real issue is absolute performance on interesting numbers of cores. Have any benchmarks ever indicated that the Doligez-Leroy or Manticore GC designs can beat your current design with a any number of cores? Not that I know of.
Jon Harrop
@Simon: And I'm certainly not attacking your design, I think you made the right (pragmatic) choice. Indeed, the fact that you created a working implementation when so many others have failed proves it. However, I do think you are now well into diminishing returns. These GC designs add horrendous complexity for little practical benefit. My impression is that a much simpler design that gets rid of the stop-the-world on local heaps is the way to go. Maybe using something like VCGC: http://doc.cat-v.org/inferno/concurrent_gc/concurrent_gc.pdf
Jon Harrop
@Jon I think Manticore beats us, but they have a number of simplifying assumptions that makes their design more tractable. My worry about concurrent GC is that the GC threads will be fighting with the mutator threads for the same meory - I don't think this is going to deliver good scaling, even if you can build a concurrent GC that allows N GC threads with N mutator threads. Local indepdently-collectable heaps on the other hand give you good locality. Also, to cope with fast allocation you need generational GC, and concurrent GC doesn't work well for the young generation.
Simon Marlow
@Simon: I'd love to see those benchmarks! Have you considered trying to simulate different designs for the pure case, e.g. how much data would get promoted into a global heap with different GC designs for different kinds of programs? I agree that Dijkstra-style concurrent collectors will not scale as you say because there is so much fine-grained synchronization but Yuasa-style snapshot collectors (like VCGC) look like they should at least let the mutators scale well.
Jon Harrop
A: 

This is not directly related to the OP's question as to whether or not to learn F#, but rather, an example of real world OCaml usage in the financial sector: http://ocaml.janestreet.com/?q=node/61

Very interesting talk.

moog