views:

354

answers:

10

The following code doesn't work as intended but hopefully illustrates my attempt:

long foo (int a, int b) {
  return a + b;
}

void call_foo_from_stack (void) {
  /* reserve space on the stack to store foo's code */
  char code[sizeof(*foo)];

  /* have a pointer to the beginning of the code */
  long (*fooptr)(int, int) = (long (*)(int, int)) code;

  /* copy foo's code to the stack */
  memcpy(code, foo, sizeof(*foo));

  /* execute foo from the stack */
  fooptr(3, 5);
}

Obviously, sizeof(*foo) doesn't return the size of the code of the foo() function.

I am aware that executing the stack is restricted on some CPUs (or at least if a restriction flag is set). Apart from GCC's nested functions that can eventually be stored on the stack, is there a way to do that in standard C?

A: 

On Linux, you cannot do this because the stack memory region is NOT not executable.
You can read something on ELF.

Tristan Su
+6  A: 

sizeof(*foo) isn’t the size of the function foo, it’s the size of a pointer to foo (which will usually be the same size as every other pointer on your platform).

sizeof can’t measure the size of a function. The reason is that sizeof is a static operator, and the size of a function is not known at compile time.

Since the size of a function is not known at compile time, that also means that you can’t define a statically-size array that is large enough to contain a function.

You might be able to do something horrible using alloca and some nasty hacks, but the short answer is no, I don’t think you can do this with standard C.

It should also be noted that the stack is not executable on modern, secure operating systems. In some cases you might be able to make it executable, but that is a very bad idea that will leave your program wide open to stack smashing attacks and horrible bugs.

Daniel Cassidy
Since the size of the function's code cannot be known by the compiler, is there a trick to define a "padded" function that has a fixed code size? Imagine the foo() function padded with nop instructions to a given size, or something similar.
Blagovest Buyukliev
Yes, look at defining segments in your linker instruction manual. Use some platform specific `pragmas` to put the function in a separate segment. Copy the contents of the segment wherever you need to.
Thomas Matthews
I don't beleive you can define this size in a C-standard way. You can place a C-style goto label at the end of the function (or even a following function) definition, and then use custom (assembly) code to compute the difference in in bytes between the byte location of the function head and that last label to get the size. Whether this works depends on how much your compiler can shuffle code around the object file. GCC has a switch to prevent functions from being reordered in memory; you can use that to good effect but fundamenatlly your solution will be impmlementation dependent.
Ira Baxter
@Ira Baxter: label at end of function is not a good idea, since it wouldn't take function epiogue code into consideration. Better to depend on non-reordering and putting a dummy function after the function you want to size... this stack execution deal is unportable anyway.
snemarch
@snemarch: I actually use the address of a dummy function before, and a dummy function after, and (unfortunately) the unpromised non-reordering of compiled functions to determine if a PC is *in* a particular function for a related activity. I don't actually copy the function body; as others have observed, it may have some nonrelocatable locations in it.
Ira Baxter
@Ira Baxter: is the dummy-before necessary?
snemarch
@snemarch: It is if you aren't sure that the address of the function is the lowest address of the function body. As an examplel, what if the compiler emits literal strings needed by the function body before the function body code?
Ira Baxter
+1  A: 

If you need to measure the size of a function, have the compiler/linker output a map file and you can calculate function size based off of that information.

bta
Not a super good solution - requires manual updating when function size changes a lot. Since this whole deal is a super platform-dependent thing to do, you might as well write unportable code to get function length.
snemarch
@snemarch - it doesn't have to be manual, the program can read in and parse its own map file. It would require keeping the map file around, but parsing a plain-text file is typically easier than trying to analyze the binary data from the exectable itself. You could even parse the map file data as part of the build process and embed it into part of the binary. That might be more analagous to compiling with debug symbols enabled and then extracting what you need from the embedded debug info, though.
bta
Extracting info as part of build process helps a bit, but you still need per-environment build-specific code, so you don't gain a lot - and it doesn't help wrt. the other caveats.
snemarch
+1  A: 

Your OS shouldn't let you do that easily. There shouldn't be any memory with both write and execute permissions, and specially the stack has many different protections (see ExecShield, OpenWall patches, ...). IIRC, Selinux also includes stack execution restrictions. You'll have to find a way to do one or more of:

  • Disable stack protection at the OS level.
  • Allow execution from the stack on a particular executable file.
  • mprotect() the stack.
  • Maybe some other things...
ninjalj
Amon the other things you may need is a CPU-dependent signal that you are executing instructions in modified memory. See the Intel reference manuals for more details, relevant to Intel CPUs; you may need something else for other CPU types.
Ira Baxter
A: 
Michael Dorgan
+1  A: 

There are lots of ways that trying to do this can go wrong, but it can and has been done. This is one of the ways that buffer overflow attacks have worked -- write in a small malicious program for what is likely the architecture of the target computer along with code and/or data that is likely to get the processor to end up executing the malicious code and hope for the worst.

There have also been less evil uses of this, but it generally is restricted by the OS and/or CPU. Some CPUs can't allow this at all since the code and stack memory are in different address spaces.

One thing that you will need to account for if you do want to do this is that the code that you write into the stack space will need to be compiled (or if written as assembly or machine code, written as) position independent code or you will have to make sure that it ends up at a certain address (and that it was written/compiled to expect this).

I don't think that the C standard says anything about this.

nategoose
+1  A: 

Your problem is roughly similar to dynamically generated code, except that you want to execute from stack instead of a generic memory region.

You'll need to grab enough stack to fit the copy of your function. You can find out how large the foo() function is by compiling it and looking at the resulting assembly. Then hard-code the size of your code[] array to fit at least that much. Also make sure code[], or the way you copy foo() into code[], gives the copied function the correct instruction alignment for your processor architecture.

If your processor has an instruction prefetch buffer then you will need to flush it after the copy and prior to executing the function from stack, or it will almost certainly have prefetched the wrong data and you'll end up executing garbage. Managing the prefetch buffer and associated caches is the biggest stumbling block I've encountered in experimenting with dynamically generated code.

As others have mentioned, if your stack isn't executable then this is a non-starter.

Andrew Cottrell
You can write your code to heap-allocated data and change the protection on that. Check out VAlloc for MS Windows; a parameter lets you specify whether the allocated space can be executed or not.
Ira Baxter
@Ira Baxter: or VirtualProtect() your stack :)
snemarch
+2  A: 

Aside from all the other problems, I don't think anyone has yet mentioned that code in its final form in memory cannot in general be relocated. Your example foo function, maybe, but consider:

int main(int argc, char **argv) {
    if (argc == 3) {
        return 1;
    } else {
        return 0;
    }
}

Part of the result:

    if (argc == 3) {
  401149:       83 3b 03                cmpl   $0x3,(%ebx)
  40114c:       75 09                   jne    401157 <_main+0x27>
        return 1;
  40114e:       c7 45 f4 01 00 00 00    movl   $0x1,-0xc(%ebp)
  401155:       eb 07                   jmp    40115e <_main+0x2e>
    } else {
        return 0;
  401157:       c7 45 f4 00 00 00 00    movl   $0x0,-0xc(%ebp)
  40115e:       8b 45 f4                mov    -0xc(%ebp),%eax
    }

Note the jne 401157 <_main+0x27>. In this case, we have an x86 conditional near jump instruction 0x75 0x09, which goes 9 bytes forward. So that's relocatable: if we copy the code elsewhere then we still want to go 9 bytes forward. But what if it was a relative jump or call, to code which isn't part of the function that you copied? You'd jump to some arbitrary location on or near your stack.

Not all jump and call instructions are like this (not on all architectures, and not even all on x86). Some refer to absolute addresses, by loading the address into a register and then doing a far jump/call. When the code is prepared for execution, the so-called "loader" will "fix up" the code by filling in whatever address the target ends up actually having in memory. Copying such code will (at best) result in code that jumps to or calls the same address as the original. If the target isn't in the code you're copying that's probably what you want. If the target is in the code you're copying then you're jumping to the original instead of to the copy.

The same issues of relative vs. absolute addresses apply to things other than code. For example, references to data sections (containing string literals, global variables, etc) will go wrong if they're addressed relatively and aren't part of the copied code.

Also, a function pointer doesn't necessarily contain the address of the first instruction in the function. For example, on an ARM processor in ARM/thumb interworking mode, the address of a thumb function is 1 greater than the address of its first instruction. In effect, the least significant bit of the value isn't part of the address, it's a flag to tell the CPU to switch to thumb mode as part of the jump.

Steve Jessop
If the code in its final form can't be relocated, then how does the operating system load your code into different areas? Hmmm. I don't think an OS swaps tasks by copying programs from a source location into a fixed "executable" area. This would consume too much time. Many of the compilers I use have a flag for generating Position Independent Code (PIC).
Thomas Matthews
@Thomas: I said that code in its final form cannot *in general* be relocated. Some code can, and some cannot. Furthermore, just because an entire program (or dll) is position-independent, it does not follow that each individual function can be relocated independently of the rest of the executable, as the questioner is hoping to do. Disassemble some code compiled with those flags: see whether you can find a function that refers to a relative address outside that function. Try for example writing two functions containing "the same" string literal.
Steve Jessop
@Thomas, executable formats (specifically both ELF used widely on *nix and PE used on Windows) include a section of relocation fixups. The OS loader is responsible for applying those fixups when the code is first loaded into a process. Because that is expensive and virtual memory allows all processes to have identical memory maps, those relocation tables are often nearly empty. Position independent code also helps reduce the use of relocation entries.
RBerteig
Oh yes, and of course some OSes either don't have protected memory, or else they reserve a region of virtual address space for shared libraries, so executables can be shared between processes without needing to be relocatable since they're mapped to the same address in every process. Not everything has executable remapping and ASLR.
Steve Jessop
+7  A: 

A valid use case for this kind of thing is an embedded system that is generally running out of FLASH memory, but is required to be able to reprogram itself in the field. To do this, a portion of the code must run from some other memory device (in my case the FLASH device itself could not erase and program one page while allowing reads from any other page, but there are devices that can do that), and there was enough RAM in the system to hold both the flash writer and the new application image to be written.

We wrote the necessary FLASH programming function in C, but used #pragma directives to have it placed in a distinct .text segment from the rest of the code. In linker control file, we had the linker define global symbols for the start and end of that segment, and had it located at a base address in the RAM, while placing the generated code in a load region that was located in the FLASH along with the initialization data for the .data segment and the pure read-only .rodata segment; the base address in the FLASH was computed and defined as a global symbol as well.

At run time, when the application update feature was exercised, we read the new application image into its buffer (and did all the sanity checks that should be done to make sure it actually was an application image for this device). We then copied the update kernel out of its dormant location in FLASH to its linked location in RAM (using the global symbols defined by the linker), then called it just like any other function. We didn't have to do anything special at the call site (not even a function pointer) because as far as the linker was concerned it was located in RAM the whole time. The fact that during normal operation that particular piece of RAM had a very different purpose was not important to the linker.

That said, all of the machinery that made this possible is either outside the scope of the standard, or solidly implementation defined behavior. The standard doesn't care how code gets loaded into memory before it is executed. It just says that the system can execute code.

RBerteig
+1 For an example of the typical Use Case for copying functions into another section in memory. I did something similar, but most of the code was in assembly.
Thomas Matthews
+1  A: 

As others have said, it's not possible to do this in a standard way - what you end up with will be platform-specific: CPU because of the way opcodes are structured (relative vs. absolute references), OS because you'll likely need to set page protection to be allowed to execute from stack. Furthermore, it's compiler-dependent: there's no standard-and-guaranteed way to get the size of a function.

If you really do have a good use-case, like the flash reprogramming RBerteig mentions, be prepared to mess with linker scripts, verify disassembly, and know you're writing very non-standard and unportable code :)

snemarch