Does anyone have any experience with doing this? I'm working on a Java decompiler right now in C++, but would like a higher level language to do the actual transformations of the internal trees. I'm curious if the overhead of marshaling data between languages is worth the benefit of a more expressive and language for better articulating what I'm trying to accomplish (like Haskell). Is this actually done in the "real world", or is it usually pick a language at the beginning of a project and stick with it? Any tips from those who have attempted it?
I'm a big advocate of always choosing the right programming language for each challenge. If there is another language which handles some otherwise tricky task easily, I'd say go for it.
Does it happen in the real world? Yes. I am currently working on a project which is made up of both PHP and objective-c code.
The trick is, as you pointed out, the communication between the two languages. If at all possible, let each language stick to its own domain, and have the two sections communicate in the simplest way possible. In my case, it was XML documents sent via http. In your case, some kind of formatted text file might be the answer.
Marshalling costs depend on the languages and architecture you're working with. For example, if you're on the CLR or JVM, there are low-cost interop solutions available - though I know you are working with probably unmanaged C++.
Another avenue is an embedded domain-specific language. Tree transformations are often expressible via pattern matching and application of a relatively small number of functions. You could consider writing a simple tree pattern-matcher - e.g. something that looks like Lisp s-exprs but uses placeholders to capture variables - with associated actions that are functions that transform the matched subtree.
John Ousterhout, the inventor of Tcl/Tk was a stong advocate of multi-language programming and wrote quite extensively about it. In order to do it, you need a clean interface mechanism between the languages you are using for it. There are quite a few mechanisms for this. Examples of different mechanisms for doing this are:
SWIG (Simplified Wrapper and Interface Generator can take a c or c++ (or several other languages) header file and generate an interface for a high level language such as perl or python that allows you to access the API. There are other systems that use this approach.
Java supports JNI, and various other systems such as Python's ctypes, VisualWorks DLL/C connect are native mechanisms that allow you to explicitly construct the call to the lower level subsystem.
Tcl/Tk was designed explicitly to be embeded, and has a native API for a C library to add hooks into the language. The constructs for this resemble argv[] structures in C, and were designed to make it relatively easy to interface a command-line based C program into Tcl. This is similar to the above example, but coming from the opposite direction. Many scripting languages such as Python, Lua and Tcl support this type of mechanism.
Explicit glue mechanisms such as Pyrex, which are similar to a wrapper generator, but have their own language for defining the interface. Pyrex is actually a complete programming language. Middleware such as COM or CORBA allow a generic interface definition to be built externally to the application in an interface definition language and language bindings for the languages concerned to use the common interface mechanism.