views:

711

answers:

12

For very low level optimization purposes it would be useful to me if I could store a compiled function inside a variable directly, not a pointer to a function. That is, if I have a function foo, I want to make a char buffer large enough to hold the machine instructions generated for foo, and then be able to in effect call foo by somehow telling C to jump execution to the contents of that buffer (assume that I have a compiler intrinsic to make sure the char buffer is aligned properly for my architecture). Ideally, I'd like to do this keeping assembly usage to a minimum (I realize some might be required).

My best solution so far would be to compile a program that has just the function I want to assembly with GCC, then compile to machine code, then use the addresses from the outputted assembly to extract the desired machine code from the executable, then manually fill the buffer with that in my program, then use inline assembly to jump to the starting address of the buffer. That's more hackish and manual work than I would like though.

I don't need to compile new functions at runtime, just have the buffer contain instructions corresponding to different already compiled functions during runtime. E.g. I might have 3 compiled functions, and 1 buffer. The 3 functions are known at compile time, but at runtime the buffer may correspond to any one of the 3 at different times.

Edit: To clarify what there would be to gain: I have a struct of which this buffer will be a member, and various pointers to instances of that struct. Each struct's buffer may contain a different compiled function. If I were to use a function pointer instead of the buffer, I would have to load the struct's function pointer, then deref the function pointer. With the buffer, I can just jump the program counter to an offset (the buffer's relative location) of the base of the struct. It's one less level of indirection. For very tiny functions, this can be a savings.

Edit 2: Further clarifying:

With a function pointer:

  1. Load pointer from &struct+offsetof(pointer)
  2. Jump to location contained in pointer

With a buffer that contains machine code:

  1. Jump to &struct+offsetof(buffer)

The second one is less steps.

A: 

Sounds like a job for a few good goto statements and inline assembly.

Just beware.

samoz
I thought that too, but then it doesn't really meet his requirement of storing the entire function's code in a variable.
T.E.D.
+2  A: 

Why not declare the functions in question inline ? That would get your function's code in-place, without sacrificing the readability. (Albeit, with some impact when it comes to debugging on a low level of the compiled code)

(As Tyler has correctly pointed out, the compiler might ignore this - I was writing under assumption that the optimization flags to trigger the compiler to inline the code were already being used).

Depending on the optimization flags and your code, the compiler might do some other optimizations besides inlining the code.

Let's say we have a (totally meaningless, fwiw) code:

#include <stdio.h>

static int test = 0;

int inline foo(int a) {
  test = 0x1234;
  return a + 0x1234;
}

int inline bar(int a) {
  test = 0x5678;
  return a + 0x5678;
}

int inline baz(int a) {
  test = 0x1357;
  return a + 0x1357;
}

int main(int argc, char*argv[]) {
  int i = getchar();
  int j = 0;
  switch(i) {
    case 'a': j = foo(i);
        break;
    case 'b': j = bar(i);
        break;
    case 'c': j = baz(i);
        break;
    default:
        j = 0;
  }
  printf("j: %d, test: %d\n", j, test);
}

Compiling it on gcc 4.1.1 with -O2 and doing the objdump -d on the resulting executable we get the following snippet for the main():

08048480 <main>:
 8048480:   8d 4c 24 04             lea    0x4(%esp),%ecx
 8048484:   83 e4 f0                and    $0xfffffff0,%esp
 8048487:   ff 71 fc                pushl  -0x4(%ecx)
 804848a:   55                      push   %ebp
 804848b:   89 e5                   mov    %esp,%ebp
 804848d:   51                      push   %ecx
 804848e:   83 ec 14                sub    $0x14,%esp
 8048491:   a1 1c a0 04 08          mov    0x804a01c,%eax
 8048496:   89 04 24                mov    %eax,(%esp)
 8048499:   e8 b2 fe ff ff          call   8048350 <_IO_getc@plt>
 804849e:   83 f8 62                cmp    $0x62,%eax
 80484a1:   74 31                   je     80484d4 <main+0x54>
 80484a3:   83 f8 63                cmp    $0x63,%eax
 80484a6:   74 3d                   je     80484e5 <main+0x65>
 80484a8:   31 d2                   xor    %edx,%edx
 80484aa:   83 f8 61                cmp    $0x61,%eax
 80484ad:   8d 76 00                lea    0x0(%esi),%esi
 80484b0:   74 44                   je     80484f6 <main+0x76>
 80484b2:   a1 24 a0 04 08          mov    0x804a024,%eax
 80484b7:   89 54 24 04             mov    %edx,0x4(%esp)
 80484bb:   c7 04 24 e8 85 04 08    movl   $0x80485e8,(%esp)
 80484c2:   89 44 24 08             mov    %eax,0x8(%esp)
 80484c6:   e8 95 fe ff ff          call   8048360 <printf@plt>
 80484cb:   83 c4 14                add    $0x14,%esp
 80484ce:   59                      pop    %ecx
 80484cf:   5d                      pop    %ebp
 80484d0:   8d 61 fc                lea    -0x4(%ecx),%esp
 80484d3:   c3                      ret
 80484d4:   b8 78 56 00 00          mov    $0x5678,%eax
 80484d9:   ba da 56 00 00          mov    $0x56da,%edx
 80484de:   a3 24 a0 04 08          mov    %eax,0x804a024
 80484e3:   eb cd                   jmp    80484b2 <main+0x32>
 80484e5:   b8 57 13 00 00          mov    $0x1357,%eax
 80484ea:   ba ba 13 00 00          mov    $0x13ba,%edx
 80484ef:   a3 24 a0 04 08          mov    %eax,0x804a024
 80484f4:   eb bc                   jmp    80484b2 <main+0x32>
 80484f6:   b8 34 12 00 00          mov    $0x1234,%eax
 80484fb:   ba 95 12 00 00          mov    $0x1295,%edx
 8048500:   a3 24 a0 04 08          mov    %eax,0x804a024
 8048505:   eb ab                   jmp    80484b2 <main+0x32>
 8048507:   90                      nop

So, the compiler already inserted the "needed goto()s". As a side effect, in this case it also already did the math of adding the 0x1234/0x5678/0x1357 as needed - so the results are simple moves.

Andrew Y
Inline is only a suggestion. The compiler may ingore it, and I would suspect that that is not acceptable here.
Tyler McHenry
Yes, thanks for the correction!
Andrew Y
Because inline won't help turn a struct's contained function pointer into the function's code.
Joseph Garvin
P.S. See my edit ;)
Joseph Garvin
Hm. If you are up to this kind of low-level things, maybe you can use TCC: http://bellard.org/tcc/ ? (though, beware the dragons:) And I wonder if you already had a chance to profile the code with and without the function pointers - I would think in case of frequent use, the pointer will be in the L1 cache anyway, so the overhead will be negligible. (of course this is all x86-centric)
Andrew Y
+13  A: 

I'm not sure why you are excluding function pointers - if you aren't modifying the code at runtime, then I can't think of anything that a buffer containing a function could do that a function pointer couldn't.

That being said, one you have a pointer to a buffer containing your function, just cast a pointer to that buffer to the right function pointer type and call that. Of course if your operating system / CPU supports it, you'll also have to clear the NX flag that prevent you from running non-executable code.

Also, you can't assume that you can copy buffers containing executable data around as if they contained regular data - you'll have to fix up branches and jumps and other memory accesses.

EDIT: I see what you're getting at, but I don't think you'll be able to find a standard-compliant way to get a function into a buffer - C++ says very little about memory structure and where functions are stored is a rather abstract concept. You may be stuck with a bunch of hacks, or else using separate assembly files that can be assembled at compile time and loaded from the files at run-time.

On another note - I think that you may find that it is faster to do the double dereferencing of storing the function pointer (4-8 bytes likely) than suffer the cache misses of storing a buffer capable of holding the largest function you will need. Your struct with an attached function has many similarities with virtual functions on classes. There are reasons why most (all?) C++ compilers implement virtual functions with a vtable rather than store the entire function definition for every instance of the class. That's not to say it's not possible that you'll get a performance increase out of storing the whole function in a buffer, it's just not as straight-forward as it may seem to be.

Eclipse
See edited post. There's a reason ;)Casting the buffer to a pointer indeed makes sense though, that will help automate a little bit more of it.
Joseph Garvin
I agree with your reasoning about holding the largest function, but as I said this will only be a savings if your functions are very small (I happen to know they all will be in this very constrained case).
Joseph Garvin
How small? Given cache effects, I'd be willing to guess (without any basis for my numbers) that if the functions are more than about 12-16 bytes, the function pointer version will be faster because it will result in more L1 cache hits. (IIRC some CPU's use different L1 caches for data and code so even data access to the struct won't cause a prefect into the L1 code cache)
BCS
And the size of the struct matters too. For instance if the struct already fills a cache line, then tagging even 1 byte of code on the end means an additional fetch is required, which may or may not be anticipated by the cache when the first part of the struct is fetched.
Steve Jessop
+12  A: 

Be aware that most modern architectures support no-execute protection for memory pages, and OSes with a bit of sense take advantage of it for security. That means that you can't use e.g. stack memory or random malloc'd memory to store this code; you need to adjust the permissions using e.g. VirtualProtect on Windows or mprotect on Unices.

Barry Kelly
And (depending on architecture) remember to flush the icache for an address range containing the code.
Steve Jessop
+7  A: 

If you're going to be compiling and optimizing code at runtime, I'd recommend taking a look at LLVM. ffcall is also a useful package. Note that both of these take care of problems like making sure the memory allocated for the function is executable and that the instruction cache is flushed properly for a large variety of architectures. You don't want to reinvent the wheel.

Adam Rosenfield
+1  A: 

Er, in this particular case, you are using function pointers, in effect. The start of buffer holding the different functions is the function pointer.

Your best bet is to write this whole portion in assembly, then link your C file to the assembly routines in the compilation/linking stage. C just doesn't support this kind of work without a lot of hacking in...assembly.

Paul Nathan
See my edit, it's not actually the same as a function pointer.
Joseph Garvin
+3  A: 

Since functions are really just labels, you need some way to determine the size of a function. One trick might be to take the address of the following function. Here's a little sample which uses this trick to get the bytecode of a 'multiply' function into a vector of chars, then rewrites the code to always return a constant, and then calls the code by casting it to an appropriate function pointer.

This is a dirty hack and probably breaks under a number of circumstances/compilers. However, if you're in a controlled environment and it works for you, it might help you out.

#include <iostream>
#include <vector>

using namespace std;

int multiply( int arg )
{
    return arg * 2;
}

int main()
{
    // Show that multiply apparently multiplies the given value by two.
    cout << multiply( 13 ) << endl;

    // Copy the bytes between &multiply and &main (which is hopefully the multiply code) into a vector
    vector<char> code( (char*)&multiply, (char*)&main );

    // Optimize it by making multiply always return 26
    code[0] = 0xB8;  // mov eax, 1Ah
    code[1] = 0x1A;
    code[2] = 0x00;
    code[3] = 0x00;
    code[4] = 0x00;
    code[5] = 0xC3;  // ret

    // Call the faster code, prints 26 even though '3' is given
    cout << ((int (*)(int))&code[0])( 3 ) << endl;
}
Frerich Raabe
Awesome idea for capturing the code, definitely very compiler dependent though ;)
Joseph Garvin
This breaks if the heap is non-executable, or if the instruction cache needs to be flushed; other than that, this is essentially the same as the other answers.
Adam Rosenfield
It also breaks if the compiler or linker output places `main` before `multiply`, but I suppose there's not much you can do about that without debug symbols...
ephemient
Yes, I know there are numerous situations where this breaks. As I wrote in my post: 'This is a dirty hack and probably breaks under a number of circumstances/compilers.'
Frerich Raabe
A: 

If you are only interested in C++, consider using 'function objects' - define a class with one method containing your function's logic, and operator overload the () operator on the class to invoke the method. Then having declared an instance of your class:

Foo foo;

...you can do this:

rtn_type result = foo( /* arg list */);

Syntactically it is like storing your function in a variable.

Eric M
To make different instances of the object correspond to different functions, you'll need operator() to call a virtual function, and then you're going to be paying the same overhead as a function pointer, which is what I'm trying to avoid.
Joseph Garvin
Slight addendum: After reading this I read how boost::function is implemented, and technically you don't need a virtual function. But their method uses *two* function pointers instead, which is still worse.
Joseph Garvin
+1  A: 

In order to actually get the machine code for your functions in an automated way during the build process, I suggest putting them in a seperate .c file, then compiling this to position-independent code in an object file:

gcc -Wall -c -fPIC functions.c -o functions.o

Then use objdump to dump out the dissassembly:

objdump -d functions.o

Throw away everything except the hex codes, put them into a C structure with xdd -i.

Note that you will have some pretty severe restrictions on this code if you don't want to deal with relocation - no access to variables with static storage duration, no access to external functions. It would be best to check with objdump -r to make sure there are no relocations needed.

caf
+2  A: 

I think you might be optimizing something a bit silly. You seem to hope to avoid some extra instructions, either dereferencing a function, or some other such silliness. Let me reccomend an alternative. First, declare your functions as normal, but mark them as inline: The inline specifier instructs the compiler to place the body of the function at the function call, rather than call the function normally (if its possible to).

inline int foo(int var)
{
  ...
}
inline int bar(int var)
{
  ...
}
inline int baz(int var)
{
  ...
}

Second, dispatch to these functions using a switch statement. On most hardware, most compilers are able to emit very efficient code, usually a jump table, or if it's more efficient, carefully crafted branch instructions.

switch(somestruct.callcondition)
{
case FOO_CASE: 
    foo(...);
    break;
case BAR_CASE:
    bar(...);
    break;
case BAZ_CASE:
    baz(...);
    break;
}

Finally, make sure you have your compiler set to turn optimizations on. In debug mode, most compilers do not emit the most efficient code possible, because they try to make the generated machine code as understandable as possible, for example by keeping the order of code as close to the source as it can. Often the compiler will not inline calls it could, or generate efficient jump tables, unless you specifically ask it to.

TokenMacGuy
I've already covered in comments to others why inline will not help in this case.
Joseph Garvin
edited to show use of a structure with a switch statement.
TokenMacGuy
+1  A: 

I know this is over a week old, but here goes anyway:

Firstly, I don't know if this will work--but couldn't you just write a function-call-like-macro which will expand to an inline jump instruction, to the buffer in the respective struct? That would be like a goto to the data in the struct, without the function call setup/teardown, but in the code it would look like a function call -- but there wouldn't be a performance hit because it would be handled by the preprocessor at compile-time.

But, this type of program behavior is something that C is not well suited for. As other answers have pointed out, the code you contain in your buffer will be severely limited (no access to external/static variables, no external function calls, etc), will break easily (in the case of no-execute protection on memory pages, not being able to return due to the link register not being set up, etc), and may be really non-portable (as well, it will probably be readability hell--nobody is going to be able to understand what you're trying to do without a lot of banging-head-against-the-desk).

Not to mention the fact that using a buffer to store executable data is the same as a function pointer (since a buffer is just a pointer to data anyways).

That being said, C is a pretty quick language to begin with--and compiling with optimizations will probably do a better job than low-level hacking (especially with what you're trying to do). The performance gain (if any at all) of trying to store executable code in a buffer will be very marginal--and that's only in cases where it doesn't run slower. That kind of performance is more than an easy enough trade off for the readability, portability and stability you get in exchange.

If you're writing this code for an embedded system or something, where performance is that touchy of an issue, you should consider writing this part of your application in assembly language. That way you would have more control over the behavior of the code and could circumvent calling conventions with the functions internal to that part. There's probably a reason you can't really do things like this in C: you should never need to--and the architecture usually doesn't support it very well (if at all). Function pointers are perfect for the behavior you want here, and member functions are even better. If you want to get even better than that, you'll have to write the assembly yourself, otherwise everything will fall apart and it will be next to impossible to maintain.

addendum

now this is even older, but I had another thought about this: with function pointers in a struct, to initialize them you just load them with an address which the compiler already knows.

To initialize the function buffer you want you have to copy a whole bunch of executable data into the buffer. Which is a performance hit.

Carson Myers
@Carson: Most of your points are valid, but the buffer is not the same as a function pointer when the buffer is being stored inside a struct. See my edit.
Joseph Garvin
the only difference you've pointed out is one instruction (see my point about the tradeoffs). In any case, it seems like you're just doing all the compilers work--what if you want to pass a parameter? Or the values of your variables start magically changing when you call the function? Whatever the differences between the buffer and function pointer, I still think you'd be way better off with a function pointer, or straight assembly.
Carson Myers
This is part of a smart pointer class, so saving a single instruction can matter. What if I want to pass a parameter? Nothing would change. What if my variable start changing magically? Then there's a bug. I'm not sure what magic you're fearing.
Joseph Garvin
A: 

Here is a solution in C to save a function into a buffer:

#include <stdio.h>
#define FUNC_SIZE(func) ((long)(& func##_end)-(long)(&func))
#define FUNC_COPY(func,buf) {buf=malloc(FUNC_SIZE(func));memcpy(buf,&func,FUNC_SIZE(func));}

int magic(int arg){
  arg+=1;
  arg-=1;
  arg*=arg;
  /* WHATEVER */
  return arg;
}
void magic_end(){
  return;
}
int main(int argc, char *argv[])
{
  int (*buf)();
  printf("Size of the function: %i\n",FUNC_SIZE(magic));
  FUNC_COPY(magic,buf);
  printf("Result: %i, %i\n",(*buf)(7),magic(7));
  return 0;
}

But I don't think that this is faster, because you still need a pointer.

knizz