views:

361

answers:

2

Is it possible using GNU tools (gcc, binutils, etc) to modify all occurrences of an assembly instruction into a no-op? Specifically, gcc with the -pg option generates the following assembly (ARM):

   0x0: e1a0c00d  mov ip, sp
   0x4: e92dd800  stmdb sp!, {fp, ip, lr, pc}
   0x8: e24cb004  sub fp, ip, #4 ; 0x4
   0xc: ebfffffe  bl 0 <mcount>

I want to record the address of this last instruction, and then change it to a nop like in the following code

   0x0: e1a0c00d  mov ip, sp
   0x4: e92dd800  stmdb sp!, {fp, ip, lr, pc}
   0x8: e24cb004  sub fp, ip, #4 ; 0x4
   0xc: e1a00000  nop   (mov r0,r0)

The Linux kernel can do something similar to this at run-time, but I'm looking for a build-time solution.

+3  A: 

This will certainly be easier with a RISC-ish fixed-length instruction format than for e.g. x86.

It should be relatively straightforward to use libelf (nice tutorial here: http://people.freebsd.org/~jkoshy/download/libelf/article.html) or libbfd (http://sourceware.org/binutils/docs-2.19/bfd/index.html) to open the object file, modify instructions within the .text section, and write it out again using provided APIs. Whether it's worth the effort or not will depend on non-technical considerations (I am a bit curious though...).

It's worth mentioning that there might be a few wrinkles with using libelf or libbfd if this needs to work in a cross-development environment.

Lance Richardson
While your point is valid, 'nop' on x86 is a single byte instruction, so you can just write as many of them as you need over the variable-length instruction that you want to remove.
Roger Lipscombe
The part that would become more problematic for the variable-length instruction case is searching for the instruction pattern to be turned into a no-op, since you can't tell whether a given byte offset is the beginning of an instruction without decoding the instructions before that offset. It's even worse if you consider the clever hand-coded assembler tricks like branching into the middle of an instruction that are sometimes used to defeat (or at least impede) reverse engineering of binaries.
Lance Richardson
True; I wasn't thinking about it from that direction. Good point.
Roger Lipscombe
+5  A: 

You can compile the code with gcc -S to output an assembler listing, instead of compiling fully into an object file or executable. Then, just replace the desired instructions with no-ops (e.g. using sed), and continue compilation from there.

If you also want to do this for object files or libraries that you don't have the original source code for, you'll instead have to use a tool such as objdump(1) to disassemble them and get the addresses of the instructions you wish to replace. Then, parse the object file headers to find the offsets within the file of those instructions, and then replace the machine instructions with no-ops directly in the object files. This is a little trickier, but doable.

Adam Rosenfield
Just be sure you don't overwrite the data when replacing as well (e.g. binary data and strings in the source).
strager
True, although 'gcc -S' won't produce assembly code for data (it will product data directives), and objdump is usually smart enough to only disassemble segments containing code, although both of these techniques are defeated by any sort of runtime code generation or self-modifying code. But that's just asking for trouble.
Adam Rosenfield