ansaurus

Question

Closure conversion and separate compilation of higher-order function calls

Answer 1

+8 A:

This is a pretty deep question with a lot of ramifications, and I don't want to write a scholarly article here. I will just scratch the surface and will point you to more information elsewhere. I am basing my response on personal experience with the Glorious Glasgow Haskell Compiler and with Standard ML of New Jersey, as well as scholarly papers written about those systems.

The key distinction made in an ambitious compiler is the distinction between known calls and unknown calls. For languages with higher-order functions, a secondary but still important distinction is whether the call is fully saturated (which we can decide only at a known call site).

A known call means a call site where the compiler knows exactly what function is being called an how many parameters it expects.
An unknown call means the compiler can't figure out what function might be called.
A known call is fully saturated if the function being called is getting all the parameters it expects, and it is going straight to code. If the function is getting fewer arguments than it expects, the function is partially applied and the call results only in the allocation of a closure

For example, if I write the Haskell functions

mapints :: (Integer -> a) -> [a]
mapints f = map f [1..]

then the call to map is known and fully saturated.
If I write

inclist :: [Integer] -> [Integer]
inclist = map (1+)

then the call to map is known and partially applied.
Finally, if I write

compose :: (b -> c) -> (a -> c) -> (a -> c)
compose f g x = f (g x)

then the calls to f and g are both unknown.

The main thing mature compilers do is optimize known calls. In your classification above this strategy falls mostly under #2.

If all call sites to a function are known, a good compiler will create a special-purpose calling convention just for that function, e.g., passing arguments in just the right registers to make things work out nicely.
If some but not all call sites of a function are known, the compiler may decided it worthwhile to create a special-purpose calling convention for the known calls, which will either be inlined or will use a special name known only to the compiler. The function exported under the name in the source code will use a standard calling convention, and its implementation is typically the thin layer which makes an optimized tail call to the specialized version.
If a known call is not fully saturated, the compiler just generates code to allocate the closure right there in the caller.

The representation of closures (or whether first-class functions are handled by some other technique such as lambda lifting or defunctionalization) is largely orthogonal to the handling of known vs unknown calls.

(It may be worth mentioning an alternative approach, used by MLton: it is a whole-program compiler; it gets to see all the source code; it reduces all functions to first order using a technique I've forgotten. There are still unknown calls because general control-flow analysis in higher-order languages is intractable.)

Regarding your final questions:

I think this issue is just one facet of the messy problem called "how to compile first-class functions". I've never heard a special name for just this issue.
Yes, there are other approaches. I've sketched one and mentioned another.
I'm not sure if there are any great, broad studies on tradeoffs, but the best one I know of, which I recommend very highly, is Making a Fast Curry: Push/Enter vs. Eval/Apply for Higher-Order Languages by Simon Marlow and Simon Peyton Jones. One of the many great things about this paper is that it explains why the type of a function does not tell you whether a call to that function is fully saturated.

To wrap up your numbered alternatives: number 1 is a nonstarter. Popular compilers use a hybrid strategy related to numbers 2 and 3. I've never heard of anything resembling number 4; the distinction between known and unknown calls seems more useful than distinguising top-level functions from arguments of function type.

Norman Ramsey 2010-02-20 00:56:42

ansaurus

tags:

views:

answers:

Closure conversion and separate compilation of higher-order function calls

related questions