views:

361

answers:

8

I had a little too much time on my hands and started wondering if I could write a self-modifying program. To that end, I wrote a "Hello World" in C, then used a hex editor to find the location of the "Hello World" string in the compiled executable. Is it possible to modify this program to open itself and overwrite the "Hello World" string?

char* str = "Hello World\n";

int main(int argc, char* argv) {

  printf(str);

  FILE * file = fopen(argv, "r+");

  fseek(file, 0x1000, SEEK_SET);
  fputs("Goodbyewrld\n", file);      
  fclose(file);    

  return 0;
}

This doesn't work, I'm assuming there's something preventing it from opening itself since I can split this into two separate programs (A "Hello World" and something to modify it) and it works fine.

EDIT: My understanding is that when the program is run, it's loaded completely into ram. So the executable on the hard drive is, for all intents and purposes a copy. Why would it be a problem for it to modify itself?

Is there a workaround?

Thanks

A: 

If you operating on Windows, I believe it locks the file to prevent it from being modified while its being run. Thats why you often needs to exit a program in order to install an update. The same is not true on a linux system.

Winston Ewert
You can modify the code of a program while it's running. (On Windows, this would be `WriteProcessMemory()`). This is how your debugger works. That said, it's a very bad idea.
asveikau
I don't think I ever claimed it was a good one :P. But why should Windows lock it? My understanding is that when the program is run, it's loaded completely into ram. So the executable on the hard drive is, for all intents and purposes a copy. Why would it be a problem to modify this?
Joel
@asveikau this has to do with a file on disk not in memory, but you are correct about what can be done in memory.
Winston Ewert
@Joel, this isn't necessarily entirely true. For a sufficiently large program windows may swap parts of the executable out of memory. Regardless, I don't see a really good reason for the behaviour
Winston Ewert
@Joel - when a program is run, it is not necessarily all loaded into memory. Portions may be paged in as needed. This is especially true of large programs which may swap data in and out as necessary (actually the OS does all the paging, it's transparent to the program). By locking the program file, the OS never has to swap out code because it knows where it can get a pristine copy of the code whenever it needs it.
Ferruccio
A: 

A program cannot modify itself, think about it. It is very dangerous.

I think the technique of self-modification is called reflection, AFAIK. I don't know much about it but it might be worth looking in to.

And a program cannot be modified while it is running, because it could easily lead to unhapiness.

Alexander Rafferty
Alexander, each sentence of your answer is incorrect.
KevinDTimm
It could lead to unhapiness though!
tster
@KevinDTimm the first part of sentence 4 is ok.
Graham Lee
sorry, I missed that (also, the unhapiness part is correct)
KevinDTimm
@Graham Lee: `And a program cannot be modified while it is running`: don't you mean that this is correct for _C_? Erlang lets you modify programs that are running.
Manoj Govindan
@Manoj: I was using 1 as my counting base, so the first part of sentence 4 is "I don't know much about it".
Graham Lee
@Graham Lee: Too much programming today. I started counting from Zero =P
Manoj Govindan
also, though the first part of 4 is correct, the full sentence is dubious at best.
KevinDTimm
+2  A: 

It is very operating system dependent. Some operating systems lock the file, so you could try to cheat by making a new copy of it somewhere, but the you're just running another compy of the program.

Other operating systems do security checks on the file, e.g. iPhone, so writing it will be a lot of work, plus it resides as a readonly file.

With other systems you might not even know where the file is.

John Smith
+1  A: 

There are non-portable ways to do this on many platforms. In Windows you can do this with WriteProcessMemory(), for example. However, in 2010 it's usually a very bad idea to do this. This isn't the days of DOS where you code in assembly and do this to save space. It's very hard to get right, and you're basically asking for stability and security problems. Unless you are doing something very low-level like a debugger I would say don't bother with this, the problems you will introduce are not worth whatever gain you might have.

asveikau
+21  A: 

On Windows, when a program is run the entire *.exe file is mapped into memory using the memory-mapped-file functions in Windows. This means that the file isn't necessarily all loaded at once, but instead the pages of the file are loaded on-demand as they are accessed.

When the file is mapped in this way, another application (including itself) can't write to the same file to change it while it's running. (Also, on Windows the running executable can't be renamed either, but it can on Linux and other Unix systems with inode-based filesystems).

It is possible to change the bits mapped into memory, but if you do this the OS does it using "copy-on-write" semantics, which means that the underlying file isn't changed on disk, but a copy of the page(s) in memory is made with your modifications. Before being allowed to do this though, you usually have to fiddle with protection bits on the memory in question (e.g. VirtualProtect).

At one time, it used to be common for low-level assembly programs that were in very constrained memory environments to use self-modifying code. However, nobody does this anymore because we're not running in the same constrained environments, and modern processors have long pipelines that get very upset if you start changing code from underneath them.

Greg Hewgill
Note that on Unix / Linux, while you can rename or even delete a running executable on disk, it is kept intact in memory until the process dies.
ring0
+1  A: 

Self-modifying code is used for modifications in memory, not in file (like run-time unpackers as UPX do). Also, the file representation of a program is more difficult to operate because of relative virtual addresses, possible relocations and modifications to the headers needed for most updates (eg. by changing the Hello world! to longer Hello World you'll need to extend the data segment in file).

I'll suggest that you first learn to do it in memory. For file updates the simplest and more generic approach would be running a copy of the program so that it would modify the original.

EDIT: And don't forget about the main reasons the self-modifying code is used:

1) Obfuscation, so that the code that is actually executed isn't the code you'll see with simple statical analysis of the file.

2) Performance, something like JIT.

None of them benefits from modifying the executable.

ruslik
A: 

On newer versions of Windows CE (atleast 5.x an newer) where apps run in user space, (compared to earlier versions where all apps ran in supervisor mode), apps cannot even read it's own executable file.

Anon
+3  A: 

All present answers more or less revolve around the fact that today you cannot easily do self-modifying machine code anymore. I agree that that is basically true for today's PCs.

However, if you really want to see own self-modifying code in action, you have some possibilities available:

  • Try out microcontrollers, the simpler ones do not have advanced pipelining. The cheapest and quickest choice I found is an MSP430 USB-Stick

  • If an emulation is ok for you, you can run an emulator for an older non-pipelined platform.

  • If you wanted self-modifying code just for the fun of it, you can have even more fun with self-destroying code (more exactly enemy-destroying) at Corewars.

  • If you are willing to move from C to say a Lisp dialect, code that writes code is very natural there. I would suggest Scheme which is intentionally kept small.

Peter G.