views:

1115

answers:

6

Why is it that functions in F# and Ocaml (and possibly other languages) are not by default recursive?

In other words, why did the language designers decide it was a good idea to explicitly make you type rec in a declaration like:

let rec foo ... = ...

and not give the function recursive capability by default? Why the need for an explicit rec construct?

+4  A: 

Some guesses:

  • let is not only used to bind functions, but also other regular values. Most forms of values are not allowed to be recursive. Certain forms of recursive values are allowed (e.g. functions, lazy expressions, etc.), so it needs an explicit syntax to indicate this.
  • It might be easier to optimize non-recursive functions
  • The closure created when you create a recursive function needs to include an entry that points to the function itself (so the function can recursively call itself), which makes recursive closures more complicated than non-recursive closures. So it might be nice to be able to create simpler non-recursive closures when you don't need recursion
  • It allows you to define a function in terms of a previously-defined function or value of the same name; although I think this is bad practice
  • Extra safety? Makes sure that you are doing what you intended. e.g. If you don't intend it to be recursive but you accidentally used a name inside the function with the same name as the function itself, it will most likely complain (unless the name has been defined before)
  • The let construct is similar to the let construct in Lisp and Scheme; which are non-recursive. There is a separate letrec construct in Scheme for recursive let's
newacct
+5  A: 

There are two key reasons this is a good idea:

First, if you enable recursive definitions then you can't refer to a previous binding of a value of the same name. This is often a useful idiom when you are doing something like extending an existing module.

Second, recursive values, and especially sets of mutually recursive values, are much harder to reason about then are definitions that proceed in order, each new definition building on top of what has been already defined. It is nice when reading such code to have the guarantee that, except for definitions explicitly marked as recursive, new definitions can only refer to previous definitions.

zrr
+1  A: 

See the F# Spec at Microsoft.com. Recursive definitions are handled differently than non-recursive definitions. If you define something as a recursive binding definition, then additional constraints are applied to uses of that expression.

If you required each binding definition to be recursion-capable (which you would have to do to remove the rec keyword from the language), you would limit other uses of the language. The page I linked to above gives an example.

Eddie
To whomever voted this down: Please comment about what in my answer is inaccurate or misleading.
Eddie
it wasn't my -1 but I would assume the OP was asking *why* the distinction was required, your explanation is circular (it is because they are different) ratehr than saying what that difference enables/allows
ShuggyCoUk
-1. You're talking non-generalized type variables but they are irrelevant to this question about the default of "rec" in the CAML branch of the ML family of languages. For example, SML is subject to the same constraints you mention yet it does make "fun"-defined functions recursive by default.
Jon Harrop
+27  A: 

One crucial reason for the explicit use of rec is to do with Hindley-Milner type inference, which underlies all staticly typed functional programming languages (albeit changed and extended in various ways).

If you have a definition let f x = x, you'd expect it to have type 'a -> 'a and to be applicable on different 'a types at different points. But equally, if you write let g x = (x + 1) + ..., you'd expect x to be treated as an int in the rest of the body of g.

The way that Hindley-Milner inference deals with this distinction is through an explicit generalisation step. At certain points when processing your program, the type system stops and says "ok, the types of these definitions will be generalised at this point, so that when someone uses them, any free type variables in their type will be freshly instantiated, and thus won't interfere with any other uses of this definition."

It turns out that the sensible place to do this generalisation is after checking a mutually recursive set of functions. Any earlier, and you'll generalise too much, leading to situations where types could actually collide. Any later, and you'll generalise too little, making definitions that can't be used with multiple type instantiations.

So, given that the type checker needs to know about which sets of definitions are mutually recursive, what can it do? One possibility is to simply do a dependency analysis on all the definitions in a scope, and reorder them into the smallest possible groups. Haskell actually does this, but in languages like F# (and OCaml and SML) which have unrestricted side-effects, this is a bad idea because it might reorder the side-effects too. So instead it asks the user to explicitly mark which definitions are mutually recursive, and thus by extension where generalisation should occur.

Ganesh Sittampalam
Great answer Ganesh!
impulse3d
Err, no. Your first paragraph is wrong (you're talking about the explicit use of "and" and not "rec") and, consequently, the rest is irrelevant.
Jon Harrop
I'd consider "rec" as a special case of "rec...and...and...", i.e. one with zero "and"s. This makes single recursion a special case of mutual recursion. As you say in your answer, SML doesn't use "rec" but does have "and", so an alternative view is to consider them to be orthogonal.
Ganesh Sittampalam
I was never pleased with this requirement. Thanks for the explanation. Another reason why Haskell is superior in design.
Bent Rasmussen
@Bent: Don't let correctness get in the way of the conclusions you set out to draw.
Jon Harrop
How exactly can side-effects be reordered during dependency analysis?
Shooshpanchick
+2  A: 

A big part of it is that it gives the programmer more control over the complexity of their local scopes. The spectrum of let, let* and let rec offer an increasing level of both power and cost. let* and let rec are in essence nested versions of the simple let, so using either one is more expensive. This grading allows you to micromanage the optimization of your program as you can choose which level of let you need for the task at hand. If you don't need recursion or the ability to refer to previous bindings, then you can fall back on a simple let to save a bit of performance.

It's similar to the graded equality predicates in Scheme. (i.e. eq?, eqv? and equal?)

Evan Meagher
+7  A: 

The French and British descendants of the original ML made different choices and their choices have been inherited through the decades to the modern variants. So this is just legacy but it does affect idioms in these languages.

Functions are not recursive by default in the French CAML family of languages (including OCaml). This choice makes it easy to supercede function definitions using let in those languages because you can refer to the previous definition inside the body of a new definition. F# inherited this syntax from OCaml.

For example, superceding the function p when computing the Shannon entropy of a sequence in OCaml:

let shannon fold p =
  let p x = p x *. log(p x) /. log 2.0 in
  let p t x = t +. p x in
  -. fold p 0.0

Note how the argument p to the higher-order shannon function is superceded by another p in the first line of the body and then another p in the second line of the body.

Conversely, the British SML branch of the ML family of languages took the other choice and SML's fun-bound functions are recursive by default. When most function definitions do not need access to previous bindings of their function name, this results in simpler code. However, superceded functions are made to use different names (f1, f2 etc.) which pollutes the scope and makes it possible to accidentally invoke the wrong "version" of a function.

Note that the answers given by Ganesh and Eddie are red herrings. They explained why groups of functions cannot be placed inside a giant let rec ... and ... because it affects the generalization of type variables due to the (potential) presence of side effects. This has nothing to do with rec being default in SML but not OCaml.

Jon Harrop
I don't think they are red herrings: if it wasn't for the restrictions on inference, it's likely that entire programs or modules would be automatically treated as mutually recursive like most other languages do. That would make the specific design decision of whether or not "rec" should be required moot.
Ganesh Sittampalam
Historically, yes, but that would break all of the code that relies upon it and it is useful in OCaml. Assuming you wanted to keep this useful OCaml idiom, you'd end up reintroducing "rec".
Jon Harrop