This also allows to hide a virus in a program which is compiled from source. Follow me.
When you write your first compiler for C, you write it in some other language. Now, you have a compiler for C in, say, assembler. Eventually, you will come to the place where you have to parse strings, specifically escape sequences. You will write code to convert \n
to the character with the decimal code 10 (and \r
to 13, etc).
After that compiler is ready, you will start to reimplement it in C. The string parsing code will become:
...
if (c == 92) { // backslash
c = getc();
if (c == 110) { // n
return 10;
} else if (c == 92) { // another backslash
return 92;
} else {
...
}
}
...
When this compiles, you have a binary which understands '\n'. This means you can change the source code:
...
if (c == '\\') {
c = getc();
if (c == 'n') {
return '\n';
} else if (c == '\\') {
return '\\';
} else {
...
}
}
...
So where is the information that '\n' is the code for 13? It's in the binary! It's like DNA: Compiling C source code with this binary will inherit this information. If the compiler compiles itself, it will pass this knowledge on to its offspring. From this point on, there is no way to see from the source alone what the compiler will do.
If you want to hide a virus in the source of some program, you must do this: Get the source of a compiler, find the function which compiles functions and replace it with this one:
void compileFunction(char * name, char * filename, char * code) {
if (strcmp("compileFunction", name) == 0 && strcmp("compile.c", filename) == 0) {
code = A;
} else if (strcmp("xxx", name) == 0 && strcmp("yyy.c", filename) == 0) {
code = B;
}
... code to compile the function body from the string in "code" ...
}
The interesting parts are A and B. A is the source code for compileFunction
including the virus, probably encrypted in some way so it's not obvious from searching the resulting binary. This makes sure that compiling to compiler with itself will preserve the virus injection code.
B is the same for the function we want to replace with our virus. For example, it could be the function "login" in the source file "login.c" which is probably from the Linux kernel. We could replace it with a version that will accept the password "joshua" for the root account in addition to the normal password.
If you compile that and spread it as a binary, there will be no way to find the virus by looking at the source.
The original source of the idea: http://cm.bell-labs.com/who/ken/trust.html