views:

560

answers:

8

I compared the languages at the language shootout game by their code size only. Here is a summary of what I got (shortest first, grouped by similar score).

  1. Python, Ruby, JavaScript, Perl, Lua, PHP, Mozart/OZ
  2. OCaml, Erlang, Racket, Go, Scala, F#, Smalltalk
  3. Pascal, Clean, Haskell, Common Lisp, C#, Java, C
  4. C++, Ada, ATS

I wonder why. The winners seem to be plain old dynamic languages. Erlang, Racket (née PLT Scheme), and F# are doing OK. Haskell and Common Lisp don't look more concise than claimed-to-be-verbose Java.

UPDATE:

I've found an insightful post on this topic with charts. I also found a similar comparison of languages for a larger program (a simple ray tracer). Altogether, I wouldn't say I got "the" answer, but I got some food for thought.

+1  A: 

Must have something to do with expansive OOP libraries available to most of the languages in your level 1 and just plain old hacks like backticks for shell calls and perl regex syntax. Going off of python

pw = file("/etc/passwd")
for l in pw:
    print l.split(':')[0]

Printing all the usernames on a system, would take a lot more code if it weren't for the abstractions that OO languages are littered with. I'm not saying that it can't be done in other paradigms, but the trend is that every type has a lot of member functions that make tedious tasks simple. Personally I find purely functional languages to be only useful for academic purposes (but then again what do I know).

Novikov
That could explain why dynamic languages score better. But it doesn't answer the question why functional languages score so poorly.
Adam Schmideg
this doesn't explain anything. if functional language has rich library we could write something like `file("/etc/passwd") map l.split(':')[0]`
Andrey
Funcional languages are niched towards academia, and they don't have an extensive library community which means you have to do more yourself. And, being niched towards academia, functional languages solve completely different problems anyway (and does so far more concisely than e.g. dynamic languages).
You
In haskell it's `readFile "/etc/passwd" >>= mapM (putStrLn . takeWhile (/=':')) . lines`, which is 7 characters longer (3 if you minimize the spaces in both the python and the haskell sample). Four of that are because it's called `readFile` in haskell and `file` in python.
sepp2k
@Novikov: the task you describe is a system scripting task, which is what languages like Python were primarily designed for. So it's not surprising that it's good at it. Functional languages will do better at other tasks (such as manipulating data with complex structure).
Gilles
+6  A: 
  1. No language is always superior to another (well, there are a few exceptions... ;) ), so same applies for a group of broadly categorized languages. The benchmarks cover a broad array of topics, and X may be less suited for one than Y.
  2. The source code is gzipped, we don't really know how many lines of what length the programs where (yeah, it's an indicator)
  3. A considerable number of functional languages still does much better then the widespread imperative, static languages - it's not that functional programming languages aren't succinct, but the dynamic languages that allow even more succinct programs
  4. At least in Haskell, much potential for conciseness comes from abstractions you can build yourself - but you do have to build them yourself and include them into your solution. A clever Haskell hacker may implement a monad in 20 lines that allow solving a small problem in 20 lines instead of 30 - the abstraction doesn't pay off for a small program, but could save many lines in a larger (eg. 200 lines instead of 300) program. I guess same applies for Lisps (only macro instead of monad)
  5. Don't take the fanboys to serious. FP is awesome and worth looking into, but it doesn't cure cancer and doesn't magically shorten any code by 25%
  6. They can still beat dynamic languages for some areas: For example, tree-like data strucutres and their processing is expressed extremely natural in many functional languages, thanks to algebraic data types and pattern matching.
delnan
>> what length the programs where (yeah, it's an indicator) << Is an indicator of formating style - for example, Ada programs seem to be long and thin, and Haskell programs seem to be shorter but wider.
igouy
if the source is gzipped formating doesn't matter
Daniel Velkov
@djv - that's why the source is gzipped - but delnan seems to think LoC would be better.
igouy
I think LoC *and* number of characters *and* possibly more metrics would be better.
delnan
@delnan - Now find some way to demonstrate that, rather than just asserting an opinion.
igouy
@igouy: Simple: With more data supplied, you can focus on e.g. the gzipped size in bytes while those who care can look at e.g. the character count. Or do you ask why I think LoC matters at all? Again simple, I consider 50 lines à 50 chars slightly more readable (all other things being equal) than 100 lines à 25 chars because more code fits on one screen.
delnan
@delnan >> while those who care can look at << All you've said is that with different kinds of data people can look at different kinds of data - you haven't even demonstrated that those different kinds of data would change which programs we thought were smaller/bigger.
igouy
@delnan >> 50 lines à 50 chars slightly more readable (all other things being equal) than 100 lines à 25 chars << Do you understand why newspapers and magazines printed in narrow columns?
igouy
@igouy: That's not really the same. A much better example would be mathematical equations, where one line = one statement. We are not as prepared to read code that flows across multiple lines (which, with a 25 character width, will most definitely happen.)
David Liu
@David Liu - I'd like to see our opinions backed by measurements showing how effective our reading and code-editing is in each circumstance ;-)
igouy
yes and nobody has mentioned J. for scientific computing, code size virtually unbeatable!
Sanjay Manohar
+1  A: 

Because some algorithms are more concise when expressed functionally and other algorithms are more concise when expressed iteratively. Generally it depends on how much you're concerned with relationships between data versus the current state of data.

Also keep in mind that you can do functional programming in languages like python, and programmers will use FP in those languages for portions of their program where it is more concise or efficient to do so, often without even knowing they are doing FP. Whereas with pure functional languages, you have no option to do iterative programming in the cases where it is more concise or efficient.

Karl Bielefeldt
>> some algorithms are more concise when expressed << Don't you need to show that the algorithms in the specific examples the question referred to actually are more concise when expressed iteratively? Otherwise you're generalization may be true but not relevant in this case.
igouy
“with pure functional languages, you have no option to do iterative programming”: technically true, but severely misleading: pure functional languages are extremely rare. Haskell isn't one, for example. It allows side effects, its “pure” aspect is that the type of the code has to mention the side effects.
Gilles
+14  A: 

If functional languages are really concise...

1 - Programming in the large is different than programming in the small.

Nothing about those tiny benchmarks game programs should be taken as an example of how the abstractions and modularisation provided by each language would apply to programming in the large.

2 - Most of what you see in the benchmarks game summary pages only refers to the fastest programs contributed for each language implementation (slower programs are usually removed from the website after a while - when and which slower programs are removed is mostly arbitrary).

{edit: Adam, as you don't wish to take my word for it that the summary pages only refer to the fastest programs - look at the script that filters data rows for the "Which programming language is best?" page. Look at line 80 and line 82 in function ValidRowsAndMins in lib_scorecard.php - Alioth issue their own security certificate so your browser will complain.}

So to take Haskell as an example, you're looking at code size of the fastest Haskell programs that have been contributed.

3 - None of the meteor-contest programs have been removed, and meteor-contest is a programming contest without restriction - the smallest meteor-contest Haskell program is the slowest meteor-contest Haskell program.

igouy
+1 because I can't do +10. Shootouts don't test concision in any useful sense. 10 lines vs 20 is not important, the important thing is 10000 vs 20000 or 10000000 vs 20000000. Oh, and *understanding* those 10M lines.
Gilles
1 - That wikipedia link doesn't confirm your claim. If a language has good ways of abstractions, it should have a better API which is simpler to call.2 - I couln't find any reference on the site about their bias towards faster programs.
Adam Schmideg
@Adam Schmideg 1 Wikipedia explains the basic difference between tiny programs and large programs. These tiny programs don't provide the opportunities to create or use abstractions, that large programs provide. 2 There isn't a bias, it's all about the faster programs! Help page "Where can I see previous programs?" http://shootout.alioth.debian.org/help.php#previous
igouy
@igouy - thank you for the help link, I didn't know about it
Adam Schmideg
+1  A: 

You should consider the fact that your group 1 languages (scripting) are 30 to 100 times slower than C/C++, for the functional languages the same is between 2 and 7 times. The programs in the list are optimized for speed and measuring anything else is a secondary issue which isn't really a good indicatior of the real state of the language.
It is more interesting to look at the table where code size and running time each have weight 1. This way you get a comparison of the speed/maintainability ratio which seems a better metric than just code size.

Daniel Velkov
+2  A: 

The programs in the Alioth game are not really representative of programs in those languages in general. For one, the implementations there are highly optimized toward the Shootout's specific infrastructure, which can lead to less idiomatic and more bloated code in a functional language. It's similar to how some Ruby libraries will write performance-critical code in C — to look at that C code and declare Ruby to be bloated and low-level would really not be giving the language a fair shake.

For another, a big part of why functional languages are touted as being so terse is that they're good at making abstractions. This tends to help more in big programs than in one-function wonders, so languages specifically engineered for easy brevity win big there. Python, Perl and Ruby are specifically designed to make short programs short, whereas most functional languages have other goals (though that's not to say they just ignore code size either).

Chuck
What program could ever be "really representative of programs in those languages in general"? - Your application is the ultimate benchmark http://shootout.alioth.debian.org/flawed-benchmarks.php#app
igouy
+4  A: 

This seems like an opportunity to whip out:

http://stackoverflow.com/questions/354124/are-there-statistical-studies-that-indicates-that-python-is-more-productive/354249#354249

The point being, the original question is trying to use some (paltry, inappropriate) data to make a generalization about comparisons among programming languages. But in fact, it's nigh impossible to use any data to make any kind of reasonable general quantitative comparisons about programming languages.

Here is some food for thought, though:

  • All things being equal, dynamically-typed languages are likely to be more concise, since they don't need to spend time describing data types
  • All things being equal, among statically-typed languages, type-inferred languages are likely to me more concise, since they don't need to declare types all over the place
  • All things being equal, among statically-typed languages, those with generics/templates are more likely to be concise, since languages without them require repeated code or casts and indirection
  • All things being equal, languages with a concise lambda syntax are likely to be more concise, since lambda is perhaps the most important abstraction in programming for avoiding repetition and boilerplate

That said, all things are not equal, not by a long shot.

Brian
A: 

The winners seem to be plain old dynamic languages.

Lisp is an obvious counter example, being a plain old dynamic language that is extremely verbose. On the other hand, APL/J/K would probably be much more concise than any of the other languages and they are dynamic. Also Mathematica...

Haskell and Common Lisp don't look more concise than claimed-to-be-verbose Java.

Your data are for tiny programs that have been optimized for performance and the measure is code size after compression using the GZIP algorithm on a particular setting so you cannot possibly draw general conclusions from them alone. Perhaps a more valid conclusion would be you are observing the bloat that results from performance optimizations so the most concise languages from your data are those that cannot be optimized because they are fundamentally inefficient (Python, Ruby, Javascript, Perl, Lua, PHP). Conversely, Haskell can be optimized with enough effort to create fast but verbose programs. Is that really a disadvantage of Haskell vs Python? Another equally valid conclusion is that Python, Ruby, Perl, Lua and PHP compress better using the GZIP algorithm on that setting. Perhaps if you repeat the experiment using run-length encoding or arithmetic coding or LZ77/8, maybe with BWT preconditioning, or another algorithm you would get completely different results?

There is also a huge amount of worthless cruft in the code on that site. Look at this snippet of OCaml code that is only necessary if your OCaml install is two generations out of date:

(* This module is a workaround for a bug in the Str library from the Ocaml
 * distribution used in the Computer Language Benchmarks Game. It can be removed
 * altogether when using OCaml 3.11 *)
module Str =
struct
  include Str

  let substitute_first expr repl_fun text =
    try
      let pos = Str.search_forward expr text 0 in
      String.concat "" [Str.string_before text pos;
                        repl_fun text;
                        Str.string_after text (Str.match_end())]
    with Not_found ->
      text

  let opt_search_forward re s pos =
    try Some(Str.search_forward re s pos) with Not_found -> None

  let global_substitute expr repl_fun text =
    let rec replace accu start last_was_empty =
      let startpos = if last_was_empty then start + 1 else start in
      if startpos > String.length text then
        Str.string_after text start :: accu
      else
        match opt_search_forward expr text startpos with
        | None ->
            Str.string_after text start :: accu
        | Some pos ->
            let end_pos = Str.match_end() in
            let repl_text = repl_fun text in
            replace (repl_text :: String.sub text start (pos-start) :: accu)
                    end_pos (end_pos = pos)
    in
      String.concat "" (List.rev (replace [] 0 false))

  let global_replace expr repl text =
    global_substitute expr (Str.replace_matched repl) text
  and replace_first expr repl text =
    substitute_first expr (Str.replace_matched repl) text
end

The single core versions often contain lots of code for parallelism, e.g. regex-dna in OCaml. Look at the monstrosity that is fasta in OCaml: the entire program is duplicated twice and it switches on the word size! I have an old OCaml version of fasta on disk here that is less that a fifth the size of that one...

Finally, I should note that I have contributed code to this site only to have it rejected because it was too good. Politics aside, the OCaml binary-trees used to contain the statement "de-optimized by Isaac Gouy" (although the comment has been removed, the deoptimization is still there making the OCaml code longer and slower) so you can assume that all of the results have been subjectively doctored specifically to introduce bias.

Basically, with such poor quality data you cannot hope to draw any insightful conclusions. You'd be much better off trying to find more significant programs that have been ported between languages but, even then, your results will be domain specific. I recommend forgetting about the shootout entirely...

Jon Harrop
>> It can be removed altogether when using OCaml 3.11 << Pity no OCaml partisan has contributed up-to-date code - The Objective Caml native-code compiler, version 3.11.1
igouy
>> only to have it rejected because << it didn't do what was required.
igouy
@iguoy: "Pity no OCaml partisan has contributed up-to-date code". You need an OCaml partisan to remove that block of code from your website for you?
Jon Harrop
When an OCaml expert notices a simple improvement they can make to an OCaml program, would you guess that they go ahead and contribute a tested program with that change, or that they spend that 5 minutes in public forums complaining about the code? Go figure.
igouy
When the author of a website is given a simple instruction explaining how they can improve the quality of their content, they refuse to do anything until an expert comes along and does it for them. Go figure.
Jon Harrop
I think it's better that people who consider themselves OCaml programmers contribute the OCaml programs, rather than people who make no claim to be OCaml programmers.
igouy
We're not talking about contributing a program. We're talking about deleting the block of code clearly marked for deletion by the OCaml programmer who already contributed it.
Jon Harrop