views:

145

answers:

4

This is not something most people would probably use, but it just came to mind and was bugging me.

Is it possible to have some machine code in say, a c-string, and then cast its address to a function pointer and then use it to run that machine code?

+6  A: 

Yes, you can absolutely do that. There's nothing stopping you unless your system or compiler prevent it somehow (like you have a Harvard architecture, for example). Just make sure your 'data' is valid instructions before you jump, or you risk disaster.

Carl Norum
Plus other O/S-specific details, e.g. whether the memory page in which the data resides has code-execute permission enabled.
ChrisW
In other words, yes, you can, unless you can't.
GregS
The other thing to remember on some architectures is to flush the instruction cache before you jump - which you might not even be able to do from user mode.
caf
A: 

One could also imagine a superoptimzer doing this to test small assembler sequences against the specifications of the function it optimizes.

dantje
+5  A: 

In theory you can, per Carl Norum. This is called "self-modifying code."

In practice what will usually stop you is the operating system. Most of the major modern operating systems are designed to make a distinction between "readable", "readwriteable", and "executable" memory. When this kind of OS kernel loads a program, it puts the code into a special "executable" page which is marked read-only, so that a user application cannot modify it; at the same time, trying to GOTO an address that is not in an "executable" page will also cause a fault exception. This is for security purposes, because many kinds of malware and viruses and other hacks depend upon making the program jump into modified memory. For example, a hacker might feed an app data that causes some function to write malicious code into the stack, and then run it.

But at heart, what the operating system itself does to load a program is exactly what you describe -- it loads code into memory, flags the memory as executable, and jumps into it.

In the embedded hardware world, there may not be an OS to get in your way, and so some platforms use this pretty regularly. On the PlayStation 2 I used to do this all the time -- if there was some code that was specific to, say, the desert level, and used nowhere else, I wouldn't keep it in memory all the time -- instead I'd load it along with the desert level, and fix up my function pointers to the right executable. When the user left the level, I'd dump that code from memory, set all those function pointers to an exception handler, and load the code for the next level into the same space.

Crashworks
+3  A: 

It is not possible even to attempt doing something like this legally in C language, since there's no legal way to make a function pointer to point to "data". Function pointers in C language can only be initialized/assigned from other function pointers, even if you use an explicit conversion. If you violate this rule, the behavior is undefined.

It is also possible to initialize a function pointer from an integer (by using an explicit conversion) with implementation-defined results (as opposed to undefined results in other cases). However, an attempt to execute the "data" by making a call through a pointer obtained in such a way still leads to undefined behavior.

If you are willing to ignore the fact that the behavior is undefined, then the actual manifestations of that undefined behavior will look differently on different platforms. On some platform it might even appear to "work".

AndreyT
On many systems, constructs like casting a void* data pointer back into a function pointer does work. I've never tried data to funcptr bu assume it would work on said machines because of the casting rules. Either way it is all a case of "Whoever defines the exact behaviour, it's not the C standard!".The way C works you *can* cast between types with little exception. The problem is when you should / shouldn't, and how the standard defines the result. It's legal to shoot your foot off ;).
TerryP
@TerryP: Yes, "does work" means that it merely compiles. That still does not make the behavior defined.
AndreyT
If this is the case, then how can one write an operating system in C?
Crashworks
@Crashworks: No one can write an operating system in C. It is not possible. What one can do is write an operating system in a C-based/C-like language that adds/defines a considerable amount of non-standard capabilities on top of C core.
AndreyT
@Crashworks: That's actually how operating systems are normally written. It is often customary to say in such cases that it is "written in C", but everybody understands that it is just a figure of speech that stretches the truth more than a bit.
AndreyT
@AndreyT: That's true, but you're making it sound more drastic than it is. A small amount of assembly language (inline or in separate files) and a good linker are really all you need on most architectures. Other non-standard stuff that kernels use (like fiddling with output sections, packing, alignment, branch-prediction hints, crazy pointer arithmetic, undefined union behaviour) is for optimisation.
Artelius
@AndreyT IMHO if it's defined for the common implementations in use, it's good enough even if the language standard doesn't mandate it. Personally, I document such things as an assumption made when neccessary, so that porters can be promptly aware of any such gotchas their setup doesn't offer.
TerryP