Implementing a raw thunk in the style of v-table thunks is a last resort. Whatever you need to accomplish can most likely be achieved with a wrapper function, and it will be much less painful.
In general, a thunk does the following:
- Fix up the input parameters (e.g., convert to a different format)
- Call the real implementation
- Clean up step 1 / fix the output parameters
To see an example of how it works, let's turn to our good friend Raymond Chen and his discussion of adjuster thunks:
http://blogs.msdn.com/oldnewthing/archive/2004/02/06/68695.aspx
The thunk he used was as follows:
[thunk]:CSample::QueryInterface`adjustor{4}':
sub DWORD PTR [esp+4], 4 ; this -= sizeof(lpVtbl)
jmp CSample::QueryInterface
As he describes, you have a class that implements the same methods through multiple interfaces, so it has multiple v-tables. (If you don't know COM, all you need to know is that it works with v-tables directly, so a pointer to a specific interface must contain function pointers to all the methods of that interface in order.)
If you implement two interfaces with different methods in a particular slot, you need multiple v-tables. But, you only write the overlapping methods once, so that method needs to be able to work with both "this" pointers. To do that, the compiler generates a method that does the fixup necessary and calls the original implementation.
So, this thunk does the following steps:
- Fix up the input parameters, namely the hidden "this" pointer, in the first line.
- Call the real implementation in the second line.
- Cleanup: none required (see below)
This is where the jmp
instruction comes in. Normally, if you were to call the function using call
, it would return to you, and you'd have to ret
back to your caller. Since there is no cleanup to do, the compiler does an optimization where it moves execution straight to the real implementation and let's the real implementation's return statement return to your caller. This is only an optimization, not a fundamental part of thunking. For example, 16/32-bit thunks will convert the input / output parameters between 16 and 32 bits as necessary, so it can't skip the cleanup step; it has to call
, not jmp
.
The moral of the story is: if you need to do something, such as the jmp
optimization, that you can't write directly in C++ or your other high level language of choice, go ahead and write an assembly language thunk. Otherwise, just write a wrapper and be done with it.
Honestly, it sounds like you're asking for the performance optimization, and most of the time (1) the compiler is better at optimizing than we think and (2) it's not going to give you as big of an improvement as you think.