views:

297

answers:

3

I'm trying to get my head around how type inference is implemented. In particularly, I don't quite see where/why the heavy lifting of "unification" comes into play.

I'll give an example in "pseudo C#" to help clarify:

The naive way to do it would be something like this:

Suppose you "parse" your program into an expression tree such that it can be executed with:

interface IEnvironment
{
    object lookup(string name);
}

interface IExpression
{
    // Evaluate this program in this environment
    object Evaluate(IEnvironment e);
}

So something like "Multiplication" might be implemented with:

class Multiply : IExpression
{
    IExpression lhs;
    IExpression rhs;
    // etc.
    public object Evaluate(IEnvironment e)
    {
        // assume for the moment C# has polymorphic multiplication
        return lhs.Evaluate(e) * rhs.Evaluate(e);
    }
}

Then to "implement" type inference, you could just do something like:

interface ITypeEnvironment
{
    Type getType(string name);
}

interface IExpression
{
    //as before
    object Evaluate(IEnvironment e);
    // infer type
    Type inferType(ITypeEnvironment typeEnvironment);
}

Then "Multiplication"'s type inference might just be something like:

class Multiply : IExpression
{
    IExpression lhs;
    IExpression rhs;

    // ... 
    public Type inferType(ITypeEnvironment typeEnvironment)
    {
        Type lhsType = lhs.inferType(typeEnvironment);
        Type rhsType = rhs.inferType(typeEnvironment);
        if(lhsType != rhsType)
             throw new Exception("lhs and rhs types do not match");

        // you could also check here that lhs/rhs are one of double/int/float etc.
        return lhsType;
    }
}

lhs and rhs might be simple constants, or "variables" which are looked up in the environment:

class Constant : IExpression
{
    object value;
    public Type inferType(ITypeEnvironment typeEnvironment)
    {
        return value.GetType(); // The type of the value;
    }
    public object Evaluate(IEnvironment environment)
    {
        return value;
    }
}

class Variable : IExpression
{
    string name;
    public Type inferType(ITypeEnvironment typeEnvironment)
    {
        return typeEnvironment.getType(name);
    }
    public object Evaluate(IEnvironment environment)
    {
        return environment.lookup(name);
    }
}

But nowhere in this do we end up with the need for a "unification" algorithm.

So, clearly, my example is not complex enough. Does it need higher order functions? Do we need "parametric polymorphism"?

What is the simplest possible example where "unification" is actually needed to correctly infer the type of an expression.

An example in Scheme would be ideal (i.e. an example of a very small Scheme program where you require unification to correctly infer the type of the s-expression).

+2  A: 

Suppose we have a function

f(x, y)

where x might be e.g. FunctionOfTwoFunctionsOfInteger or it might be FunctionOfInteger.

suppose we pass in

f(g(u, 2), g(1, z))

Now with unification, u is bound to 1, and z is bound to 2, so the first argument is a FunctionOfTwoFunctionsOfInteger.

So, we have to infer that the type of x and y are both FunctionOfTwoFunctionsOfInteger

I'm not too familiar with C#, but with lambda expressions (or equivalently delegates or whatever) this should probably be possible.

For an example of where type inferencing is very useful in improving the speed of theorem proving, take a look at "Schubert's Steamroller"

http://www.rpi.edu/~brings/PAI/schub.steam/node1.html

There's an issue of the Journal of Automated Reasoning devoted to solutions and formulations to this problem, most of which involve typed inferencing in theorem proving systems:

http://www.springerlink.com/content/k1425x112w6671g0/

Larry Watanabe
I don't quite follow this... I wanted a *simple* example! ;-) But thanks for the links I'll take a look
Paul Hollingsworth
+2  A: 
public Type inferType(ITypeEnvironment typeEnvironment)
{
    return typeEnvironment.getType(name);
}

What if you just don't know the type of the variable? That's the whole point of type inference, right? Something very simple like this (in some pseudocode language):

function foo(x) { return x + 5; }

You don't know the type of x, until you infer the addition, and realize that it must be an integer because it's added to an integer, or something like that.

What if you have another function like this:

function bar(x) { return foo(x); }

You can't figure out the type of x until you have figured out the type of foo, and so on.

So when you first see a variable, you have to just assign some placeholder type for a variable, and then later, when that variable is passed to some kind of function or something, you have to "unify" it with the argument type of the function.

newacct
Ah I see - functions is enough. You don't know the type of x, but then "x+5" implies that x must be 5...
Paul Hollingsworth
implies that x must be an int, meant to say
Paul Hollingsworth
Not quite - it just implies that x must be something for which there is a binary operator + for which the rhs is an int (or something for which there is an implicit conversion from int, if the language has those). With this much flexibility, you might just define bar as a generic method, `bar<T>(T x)` where `T` has a constraint that it must support operator `+` with an integer (the compiler infers the constraint from how you used it inside `bar`). So there is no need to tie down the type of `x` when you compile `bar`.
Daniel Earwicker
+3  A: 

Let me completely ignore your example and give you an example of where we do method type inference in C#. (If this topic interests you then I encourage you to read the "type inference" archive of my blog.)

Consider:

void M<T>(IDictionary<string, List<T>> x) {}

Here we have a generic method M that takes a dictionary that maps strings onto lists of T. Suppose you have a variable:

var d = new Dictionary<string, List<int>>() { ...initializers here... };
M(d);

We have called M<T> without providing a type argument, so the compiler must infer it.

The compiler does so by "unifying" Dictionary<string, List<int>> with IDictionary<string, List<T>>.

First it determines that Dictionary<K, V> implements IDictionary<K, V>.

From that we deduce that Dictionary<string, List<int>> implements IDictionary<string, List<int>>.

Now we have a match on the IDictionary part. We unify string with string, which is obviously all good but we learn nothing from it.

Then we unify List with List, and realize that we have to recurse again.

Then we unify int with T, and realize that int is a bound on T.

The type inference algorithm chugs away until it can make no more progress, and then it starts making further deductions from its inferences. The only bound on T is int, so we deduce that the caller must have wanted T to be int. So we call M<int>.

Is that clear?

Eric Lippert
Yes I already subscribe to your blog - glad to see you on Stack Overflow! So I am gathering that the unification comes from the composition of types from other types... so perhaps higher order functions is sufficient in a "scheme-like" language to enforce the need for unification... hmm...
Paul Hollingsworth
To give more context - wrote domain specific language where I work, whose interpreter is written in C#... we're looking at adding type inference to it to provide better up front checking. Actually, it's not really that domain-specific - it's essentially a scheme interpreter that is very well integrated with our application. Have already added full support for continuations, so we figured, some sort of type inference would be nice too! But, first we have to figure out how to do it!
Paul Hollingsworth
Oh, and while you might be reading this: Single biggest win for C# would be better support for detecting IDisposable usage errors. I would jump up and down and cheer for joy if you added this!
Paul Hollingsworth