views:

114

answers:

5

Let's say you have a dynamic language (similar to common scripting languages) and want to write a compiler for it. Is it good idea to reserve some bits of a machine word for something like type tags? Especially for microcontrollers and similar devices, is it a good idea?

I can think of some advantages like reduced storage for type information, somewhat easier memory management (maybe even gc) and easier debugging. But would these justify the overhead on common arithmetic operations or other operations which needs the full word? Note that a bytecode VM will be much worse in this regard since there is much more performance overhead. So don't suggest it ;)

It is not like some would write a numerical intense code for a microcontroller-class hardware anyway but still...

+2  A: 

The added complexity of having to ensure your type bits didn't pass through to any calculations probably would far outstrip any storage savings. You could always allocate a type field adjacent to any primitive field which contains any needed metadata/flags. Then you know that any value always has a storage size of n+1 words, assuming you could get away with a single word for storing your desired type and status info.

caskey
A: 

No, no way that this is a good idea for a general-purpose compiler. The overhead of handling the "type tag" bits in arithmetic operations will be severe.

By their very nature, dynamically-typed languages require extra space to store the type information for each value. If you have to store lots and lots of homogeneously-typed data, the right way to do it is usually with a native-code module designed to do that in C!

For example, when you want to store an array of 5 integers, a Python list is okay (it can store arbitrarily complex mixed types). But if you want to store an array of 5 million integers, you should use the array module which stores them as a homogeneous C array, or NumPy which does something similar but optimized for doing a lot of math on them.

Dan
+1  A: 

It might depend just how "micro" your microcontroller is.

For example I'd guess (without having ever tried it) that the barrel shifter on an ARM core, and/or the free masks you can have on load/store, would keep the cost of dealing with these flags fairly reasonable. Obviously that's at the top end of the class, but ARM does get everywhere these days.

LISP uses type flags which mean fixnum is smaller than a word. So you could look to LISP implementations (if you can find them for the processors you care about) to see how they minimise the cost, and whether their best efforts would meet your requirements.

Steve Jessop
+1  A: 

SPARC chips have tagged arithmetic facilities directly in hardware - designed explicitly for this sort of application. I've also seen references to other architectures that had this feature. Quite how widely they're used for this in practice is another question - most dynamic languages such as Python are built for portability so one doesn't really have the option of relying on this in your architecture.

I think older smalltalks used to do this with small integers - up to a certain value was an int and over the threshold was an object pointer.

ConcernedOfTunbridgeWells
A: 

On the one hand, if you want your compiled code to straightforwardly test the flags for each operation, it will be slow for the same kind of reason that interpreted bytecode is slow. Maybe not as slow as bytecode, but Amdahl's law will be working against you.

On the other hand, a simple compiler for a fully dynamic language will need to do some form of typechecking anyway. Universal use of dynamic dispatch will cause serious performance penalties on modern processors (though maybe not so severe for microcontrollers?)

If your compiler can optimize away most of the above run-time typechecking, you can get your performance back. However, implementing this may be neither straightforward or simple. My guess is that you'd want to avoid bit flags in that case, as the required masking would just be unnecessary work.

comingstorm