views:

31

answers:

1

The idea is somewhat similar to what Apple has done in the OpenGL stack. I want to have that a bit more general.

Basically, I want to have specialised and optimised variants of some code for some specific cases.

In other words: I have given an algorithm/code for a function (let B = {0,1})

f : B^n -> B^m

Now, I special a specific case by a function (which predefines part of the input of f)

preset : {1..n} -> {0,1,unset}

The amount of predefinitions (∈ {0..n}) is then given by

pn := |preset⁻¹({0,1})|

Canonically, we now get a specialised function

f_preset : B^(n-pn) -> B^m

Also canonically, we get the code/algorithm for this specialised function. Naturally, the code for f_preset will be somewhat more fast than f with pn > 0. Then, you also can optimise this code further (there might be some dead code now, some loops can be unpacked now, some calculations can be precalculated, etc). In some cases, it can have noteable improvements.

Apple does roughly this for their OpenGL stack (from what I have read / know): They try to find a good preset at runtime after everything is setup for variables which will not change anymore, then make an optimised version of the specialised function and only use that one instead of the original function.

Initially, I thought about a way to optimise the physics simulation of some own game. There I have a lot of particle objects and a set of particle types (which is unknown at compile time). A particle type is a set of attributes. The particle types are fixed and constant once they are loaded. Each particle object is of one of theye particle types. The physic simulation for a particle object is some very heavy peace of code with many many branches and very heavily depends on the particle type. My idea was now to have an optimised physics simulation function for each particle type.

After thinking a bit about this, I wanted to go a bit further:

I want to automatically calculate a set of such presets at runtime and maintain the optimised code for each. And I want to automatically add or remove presets when the circumstances change.

There are several questions now:

  • Is there an easy way to calculate a good preset? How do I know what variables are constant for a given situation?
  • Is there an easy way to check how good a preset is? 'Good' refers to the performance of the resulting optimised code.
  • How to compare two algorithms/codes for performance? Via some heuristic? Or by testing with random input?
  • How many presets (and optimised code variants) should there be for a function? A fixed limit for all functions? Or is this different for every function? Is it maybe even depending on the current computer state?
  • How to maintain the different optimised code variants? A wrapper function around f which chooses automatically the best optimised variant doesn't seem to be very nice as this maybe not so easy check would be needed for every single call. A solution to this problem might also be deeply related to the question about how to find the set/amount of good presets. (In the particle type case, the optimised code would be attached to / saved together with the particle type. The amount of particle types also define the amount of presets.)

For my initial case, most of these questions are kind of obsolete but am really interested now in how to do this in a more general way. Of course, most/all of these questions are also uncalculateable but I wonder to what degree you may still get good results.

This whole topic is also very important for optimisations in JIT compilers. Are they doing these kind of optimisations already? To what degree?

Are there good recent research works which answers some of my questions? Or maybe also some results which say that it is just too hard to do this in such a general way?

A: 

It seems to me you are asking about partial evaluation.

I actually have a bit of a problem with that concept, because it is usually couched in terms that are over-academic and over-difficult.

The way it is usually expressed is that you have some general function F(Islow, Ifast) having arguments that can take different values at different times. The Islow arguments change seldom, and the Ifast arguments can be different every time it is called.

Then the problem is to write some kind of partial-evaluator function G(F, Islow) -> F1(Ifast) that takes function F and the Islow arguments, and generates a new (simpler) function F1 that only takes the Ifast arguments.

The problem with this is 1) somebody has to write the general function F, and 2) somebody has to write the general partial evaluator G.

What makes more sense to me is to write from scratch a function H(Islow) -> F1(Ifast), that is, write a code-generator specifically for F1, rather than writing two functions F and G, especially where G is very difficult to write.

H is usually much easier to write than F, and G need not be written at all! The result function F1 usually is smaller and has much higher performance than F, so it's a win-win situation.

When people write code generators, that is what they are doing, and it is a very effective programming technique.

Mike Dunlavey
Thanks for the info! Though, that is not exactly what I asked for. I don't want to be forced to generate the specialised version first to call the function. I also don't want to do the decision by hand if I should keep the generated code or not. I am mainly interested in how to automatically determine what specialised version I should generate. It should work without changing a single line of code. Anyway, the approach you are presenting is still interesting.
Albert
@Albert: I know, you want it to be a general G. The point I'm trying to make is that that's not necessarily a good thing to want. Even if you could find a good G, you would have to apply it to an F. (This is hard to explain to my ivory-tower friends.) In actual realistic cases a general F is much much harder to write than an H that prints out specific F1s. I can give you an example where the development effort was decreased by an order of magnitude and, of course, the performance was no contest.
Mike Dunlavey
Ah, I see what you mean. The thing is, I want to apply it to already existing code. So I have F already. My application already works for all the cases it was designed for. I just want to have it optimised automatically for special cases. -- Basically the same thing what Apple is doing in their OpenGL stack.
Albert
@Albert: OK. Good luck.
Mike Dunlavey