views:

54

answers:

2

What I'm doing: I'm writing a small interpreter system that can parse a file, turn it into a sequence of operations, and then feed thousands of data sets into that sequence to extract some final value from each. A compiled interpreter consists of a list of pure functions that take two arguments: a data set, and an execution context. Each function returns the modified execution context:

type ('data, 'context) interpreter = ('data -> 'context -> 'context) list

The compiler is essentially a tokenizer with a final token-to-instruction mapping step that uses a map description defined as follows:

type ('data, 'context) map = (string * ('data -> 'context -> 'context)) list

Typical interpreter usage looks like this:

let pocket_calc = 
  let map = [ "add", (fun d c -> c # add d) ;
              "sub", (fun d c -> c # sub d) ;
              "mul", (fun d c -> c # mul d) ]
  in 
  Interpreter.parse map "path/to/file.txt"

let new_context = Interpreter.run pocket_calc data old_context

The problem: I'd like my pocket_calc interpreter to work with any class that supports add, sub and mul methods, and the corresponding data type (could be integers for one context class and floating-point numbers for another).

However, pocket_calc is defined as a value and not a function, so the type system does not make its type generic: the first time it's used, the 'data and 'context types are bound to the types of whatever data and context I first provide, and the interpreter becomes forever incompatible with any other data and context types.

A viable solution is to eta-expand the definition of the interpreter to allow its type parameters to be generic:

let pocket_calc data context = 
  let map = [ "add", (fun d c -> c # add d) ;
              "sub", (fun d c -> c # sub d) ;
              "mul", (fun d c -> c # mul d) ]
  in 
  let interpreter = Interpreter.parse map "path/to/file.txt" in
  Interpreter.run interpreter data context

However, this solution is unacceptable for several reasons:

  • It re-compiles the interpreter every time it's called, which significantly degrades performance. Even the mapping step (turning a token list into a interpreter using the map list) causes a noticeable slowdown.

  • My design relies on all interpreters being loaded at initialization time, because the compiler issues warnings whenever a token in the loaded file does not match a line in the map list, and I want to see all those warnings when the software launches (not when individual interpreters are eventually run).

  • I sometimes want to reuse a given map list in several interpreters, whether on its own or by prepending additional instructions (for instance, "div").

The questions: is there any way to make the type parametric other than eta-expansion? Maybe some clever trick involving module signatures or inheritance? If that's impossible, is there any way to alleviate the three issues I have mentioned above in order to make eta-expansion an acceptable solution? Thank you!

+3  A: 

I think your problem lies in a lack of polymorphism in your operations, which you would like to have a closed parametric type (works for all data supporting the following arithmetic primitives) instead of having a type parameter representing a fixed data type. However, it's a bit difficult to ensure it's exactly this, because your code is not self-contained enough to test it.

Assuming the given type for primitives :

type 'a primitives = <
  add : 'a -> 'a;
  mul : 'a -> 'a; 
  sub : 'a -> 'a;
>

You can use the first-order polymorphism provided by structures and objects :

type op = { op : 'a . 'a -> 'a primitives -> 'a }

let map = [ "add", { op = fun d c -> c # add d } ;
            "sub", { op = fun d c -> c # sub d } ;
            "mul", { op = fun d c -> c # mul d } ];;

You get back the following data-agnostic type :

 val map : (string * op) list

Edit: regarding your comment about different operation types, I'm not sure which level of flexibility you want. I don't think you could mix operations over different primitives in the same list, and still benefit from the specifities of each : at best, you could only transform an "operation over add/sub/mul" into an "operation over add/sub/mul/div" (as we're contravariant in the primitives type), but certainly not much.

On a more pragmatic level, it's true that, with that design, you need a different "operation" type for each primitives type. You could easily, however, build a functor parametrized by the primitives type and returning the operation type.

I don't know how one would expose a direct subtyping relation between different primitive types. The problem is that this would need a subtyping relation at the functor level, which I don't think we have in Caml. You could, however, using a simpler form of explicit subtyping (instead of casting a :> b, use a function a -> b), build second functor, contravariant, that, given a map from a primitive type to the other, would build a map from one operation type to the other.

It's entirely possible that, with a different and clever representation of the type evolved, a much simpler solution is possible. First-class modules of 3.12 might also come in play, but they tend to be helpful for first-class existential types, whereas here we rhater use universal types.

Interpretive overhead and operation reifications

Besides your local typing problem, I'm not sure you're heading the right way. You're trying to eliminate interpretive overhead by building, "ahead of time" (before using the operations), a closure corresponding to a in-language representation of your operation.

In my experience, this approach doesn't generally get rid of interpretive overhead, it rather moves it to another layer. If you create your closures naïvely, you will have the parsing flow of control reproduced at the closure layer : the closure will call other closures, etc., as your parsing code "interpreted" the input when creating the closure. You eliminated the cost of parsing, but the possibly suboptimal flow of control is still the same. Additionnaly, closures tend to be a pain to manipulate directly : you have to be very careful about comparison operations for example, serialization, etc.

I think you may be interested in the long term in an intermediate "reified" language representing your operations : a simple algebraic data type for arithmetic operations, that you would build from your textual representation. You can still try to build closures "ahead of time" from it, though I'm not sure the performances are much better than directly interpreting it, if the in-memory representation is decent. Moreover, it will be much easier to plug in intermediary analyzers/transformers to optimize your operations, for example going from an "associative binary operations" model to a "n-ary operations" model, which could be more efficiently evaluated.

gasche
Thank you. This would nicely solve the problem for the data type, though I'm not certain if it would support a map with add/sub/mul and another with add/sub/mul/div without having to make the two maps completely independent (and using a different primitive type each time).
Victor Nicollet
+3  A: 

A viable solution is to eta-expand the definition of the interpreter to allow its type parameters to be generic:

 let pocket_calc data context = 
   let map = [ "add", (fun d c -> c # add d) ;
               "sub", (fun d c -> c # sub d) ;
               "mul", (fun d c -> c # mul d) ]
   in 
   let interpreter = Interpreter.parse map "path/to/file.txt" in
   Interpreter.run interpreter data context

However, this solution is unacceptable for several reasons:

  • It re-compiles the interpreter every time it's called, which significantly degrades performance. Even the mapping step (turning a token list into a interpreter using the map list) causes a noticeable slowdown.

It recompiles the interpreter every time because you are doing it wrong. The proper form is more something like this (and technically, if the partial interpretation of Interpreter.run to interpreter can do some computations, you should move it out of the fun too).

 let pocket_calc = 
   let map = [ "add", (fun d c -> c # add d) ;
               "sub", (fun d c -> c # sub d) ;
               "mul", (fun d c -> c # mul d) ]
   in 
   let interpreter = Interpreter.parse map "path/to/file.txt" in
   fun data context -> Interpreter.run interpreter data context
Pascal Cuoq