views:

163

answers:

3

I'd like to implement a class type for my own little language but what I thought at first wouldn't be too hard has got me stumped. I have the parser in place and it's the code generation side of things I'm having problems with. Can anyone shed any light on the best/correct way to go about this? Specifically I'd like to do this in LLVM so while I need to know the generalities of this any specific LLVM code I should be working with would be fantastic.

Thanks T.


N.B. The experience I have with LLVM is basically what comes from the Kaleidoscope tutorials and a little extra from playing around with it but I am far from having a full understanding of the LLVM API's.

+4  A: 

A very, very incomplete overview:

Class is a structure (you know C/C++ don't you?)

Methods are otherwise ordinary functions except they receive an extra implicit argument: the object itself. This argument is usually called 'this' or 'self' within the function. Class-scope symbols may (C++, JavaScript) or may not (PHP, Python) be accessible by default within methods.

Inheritance is essentially gluing together the structures and possibly also merging symbol tables as well, as normally symbols of the base class are accessible by default from within the methods of a class you are now parsing. When you encounter a symbol (field or method) within a method you need to do an ascending lookup, starting from the current class going up the hierarchy. Or you may implement it so that you look it up only in one symbol table which is a result of a merger.

Virtual methods are called indirectly. In some languages all methods are virtual by default. The implementation would depend on whether it's a fully dynamic language, in which case you always look up a function name within a class at run-time and thus all your methods become virtual automatically; or in case of static languages compilers usually build so called virtual method tables. I'm not sure if you need this at all, so I won't go into details here.

Constructors are special methods that are called either when constructing a new object (usually with 'new') or otherwise are called as part of the constructor call chain from within descendant constructors. Many different implementations are possible here, one being that a constructor takes an implicit 'this' argument, which may be NULL if an object hasn't been created yet, and returns it as well.

Destructiors are ordinary methods that are normally called implicitly when an object goes out of scope. Again you need to take into account a possibility of an ascending call chain for destructors.

Interfaces are tricky unless, again, your language is fully dynamic.

mojuba
+1 Nice summary, I generally agree. Except that interfaces are not any harder than inheritance in general.
delnan
@delnan thanks. In general, in static languages interfaces are trickier than ordinary virtual calls: for each interface method call an extra step is involved to find the VMT of that particular interface. A caller may or may not know the exact type of an object through which this call is performed. But as I said in dynamic languages you basically do nothing except maybe some run-time compatibility checks.
mojuba
@mojuba: As the famous quote states, "All problems in computer science can be solved by another level of indirection" ;) Of course it takes an additional step, but it's not really complicated.
delnan
@delnan: right :) Of course complication here is, rather, in making an Intf. method call as efficient as possible. A few implementations in static languages that I've seen used linear search to find an Intf. VMT by an internal ID. And that's for each call! I don't know if better implementations exist to be honest.
mojuba
@mojuba: I suppose since the VMT won't change after compilation, a balanced binary tree might be possible. Also, polymorphic inline cache could do wonders here. But anyway: First make the prototype compiler work correctly, *then* wonder about the best implementation.
delnan