views:

133

answers:

3

Parametrized types such as C++ templates are a nice thing, but most of the time they can only be parametrized by other types.

However there is a special case in C++ where it is possible to parametrize a template by an integer. For example, fixed-length arrays are a typical use case:

template<typename T, int SIZE> class FixedArray 
{ 
    T m_values[SIZE]; 

public: 
    int getElementCount() const { return SIZE; }

    T operator[] (int i) const { 
        if (i<0 || i>=SIZE) 
            throw;
        else 
            return m_values[i]; 
    }  
};

void f()
{
    FixedArray<float, 5> float_array; // Declares a fixed array of 5 floats. 
    //The size is known at compile time, and values are allocated on the stack.
}

Only constant integers and pointers are allowed in C++, but I think it could be interesting to use any value for parametrization (floats, class instances, etc.). This could allow expressing preconditions at compile-time (usually informally specified in the documentation), and check them automatically at runtime. For example, here is an "Interval" template in an hypothetical C++ dialect :

// Interval of type T between TMin and TMax. 
template<typename T, T TMin, T TMax> class Interval
{
    T m_value;

public:
    Interval(int val) { *this = val; }

    Interval& operator = (T val) { 
        //check that 'val is in [TMin, TMax] and set the value if so, throw error if not
        if (val < TMin || val > TMax) 
            throw;
        else
            m_value = val; 

        return *this;
    };  

    operator T() const { return m_value; }
}   

// Now define a f function which is only allowing a float between O and 1
// Usually, that kind of function is taking a float parameter with the doc saying "the float is in 0-1 range". But with our Interval template we have
// that constraint expressed in the type directly.
float f(Interval<float, 0, 1> bounded_value)
{
    // No need to check for boundaries here, since the template asserts at compile-time that the interval is valid
    return ...;
}

// Example usage
void main();
{
    float val = 0.5;

    Interval<float, 0, 1> sample_interval = val;    // OK. The boundary check is done once at runtime.

    f(sample_interval);             // OK. No check, because it is asserted by the type constraint at compile-time.
                                // This can prevent endless precondition testing within an algorithm always using the same interval

    sample_interval = 3*val;            // Exception at runtime because the constraint is no longer satisfied

    f(sample_interval);             // If I am here, I am sure that the interval is valid. But it is not the case in that example.
}   

What could be interesting then would be to express relationships between these types. For example, expressing the rule to assign an Interval A to another Interval B with other boundaries, or simply a rule to assign a value to an Interval, with everything checked at compile-time.

Is there any language with this kind of parametrization (or a similar approach), or one must still be invented? Any useful research papers?

+3  A: 

I think you're basically describing Dependent Types. There exist a number of (mostly research) languages that implement these, linked from the article. It tends to become intractable to automatically prove type inhabitation in the general case (i.e. type checking gets very hard, or is in the general case not decidable), but there have been some practical examples of their use.

Gian
+1  A: 

Ada95 supports generic formal parameters that are values. In the example on this page, Size is a generic formal parameter whose value is required to be a positive integer.

Stephen C
Are these statically checkable somehow or do they end up as runtime assertions?
Gian
I don't know ada, but at first glance this looks to be the same as Gabriel's template example, i.e. the value of Size needs to be known at compile time.
sepp2k
Any type conversion involving constrained types can throw Constraint_Error, and I don't see any reason why conversion of values for generic formal parameters would be different. There is no mention in the (draft 4.0) Ada95 spec that expressions providing formal parameter values need to be compile-time evaluable.
Stephen C
+4  A: 

Types that are parametrized by values are called dependent types.¹ There has been a lot of research on the topic of dependent types, but little of it has reached “mainstream language”.

The big problem with dependent types is that if your types contains expressions, i.e., bits of code, then the type checker must be able to execute code. This can't be done in full generality: what if the code has side effects? what if the code contains an infinite loop? For example, consider the following program in a C-like syntax (error checking omitted):

int a, b;
scanf("%d %d", &a, &b);
int u[a], v[b];

How could the compiler know whether the arrays u and v have the same size? It depends on the numbers the user enters! One solution is to forbid side effects in expressions that appear in types. But that doesn't take care of everything:

int f(int x) { while (1); }
int u[f(a)], v[f(b)];

the compiler will go into an infinite loop trying to decide whether u and v have the same size.

<expanded>
So let's forbid side effects inside types, and limit recursion and looping to provably terminating cases. Does it make type checking decidable? From a theoretic point of view, yes, it can. What you have might be something like a Coq proof term. The problem is that type checking is then easily decidable if you have enough type annotations (type annotations are the typing information that the programmer supplies). And here enough means a lot. An awful lot. As in, type annotations at every single language constructs, not just variable declarations but also function calls, operators and all the rest. And the types would represent 99.9999% of the program size. It would often be faster to write the whole thing in C++ and debug it than writing the whole program with all required type annotations.

Hence the difficulty here is to have a type system that requires only a reasonable amount of type annotations. From a theoretical point of view, as soon as you allow leaving off some of the type annotations, it becomes a type inference problem rather than a pure type checking problem. And type inference is undecidable for even relatively simple type systems. You can't easily have a decidable (guaranteed to terminate) static (operating at compile time) reasonable (not requiring an insane amount of type annotations) dependent type system.
</expanded>

Dependent types do sometimes appear in a limited form in mainstream languages. For example, C99 allows arrays whose size is not a constant expression; the type of such an array is a dependent type. Unsurprisingly for C, the compiler is not required to check bounds on such an array, even when it would be required to check bounds for an array of constant size.

More usefully, Dependent ML is a dialect of ML with types that can be indexed by simple integer expressions. This allows the type checker to check most array bounds statically.

Another example of dependent type appears in module systems for ML. Modules and their signatures (also called interfaces) are similar to expressions and types, but rather than describing computations, they describe the structure of the program.

Dependent types turn up very often in languages that are not programming languages in the sense most programmers would recognize, but rather languages for proving mathematical properties of programs (or just mathematical theorems). Most of the examples in the wikipedia page are of that nature.

¹ More generally, type theorists classify type systems according to whether they have Higher-order types (types parametrized by types), polymorphism (expressions parametrized by types), and dependent types (types parametrized by expressions). This classification is called Barendregt's cube or the lambda cube. In fact it is a hypercube, but usually the fourth dimension (expressions parametrized by expressions, i.e., functions) goes without saying.

Gilles
Interesting! Maybe if the type checker could be limited to constant expressions and literals only, possibly with a non-turing complete subset of the language it could be an interesting addition to a language (such as enforcing basic preconditions and invariants). I'll investigate on this Dependent Types thing.
Gabriel Cuvillier
@Gabriel: I see I went a bit fast in the middle there. Even with a fairly restricted subset of expressions available inside types, it can be hard to have a usable programming language with dependent types (see my edit). You're right about enforcing assertions, that's the big motivation behind dependent types. (Mind you, it's the big motivation behind types of any kind.)
Gilles