views:

199

answers:

1

I have a sort of simulator for x86 assembly instructions, but the thing is it doesn't accept the full instruction set. For instance if given an INT command it will terminate. It is possible to run all the binary representation (8bit, 16bit and 32bit) of commands on the simulator and see which one's are valid and which are not.

It is for using in genetic programming and need to mutate the commands binary representation, but trying to do this without creating an invalid one.

The easiest solution seems to just count them, but how will the transform function between the original and smaller instruction sets work?

+2  A: 

If the code segment is modifiable, it will be very difficult to create a translator, because you will need to account for the possibility of self-modifying code. Any such translator would need to include a copy of itself in the generated code; at that point it would be easiest to just 'finish' the simulator.

If the code segment is not modifiable, it's still very difficult, because with x86 it's possible to jump into the middle of an instruction, and have it interpreted as a different instruction. So while in principle you could build a static translation for all possible start addresses, and build a big jump table to figure out which static translation you need, it's still not worth it.

I would suggest that rather than converting general x86 code to this subset, instead constrain the code generated by the GA to make it fit into the subset. You can try using techniques such as those described in the google native client paper to further restrict the code to avoid the jumping-into-the-middle-of-instructions problem.

Alternately, there's always the option of using a complete x86 emulator instead of a limited one. You'll still have the problem of the GA generating illegal opcodes however. You could also consider using an ISA custom-designed to be easy to emulate instead of x86 and emulating that. Then compile to x86 (you did design it to be easy to compile, right?) when you have something you want to keep.

This reference seems similar to what you're doing as well, you might want to take a look:

  1. Nordin, Peter. "A compiling genetic programming system that directly manipulates the machine code." Advances in genetic programming. Cambridge, Mass: MIT, 1994. 311-32. Print.
bdonlan