ansaurus

Question

Questions Regarding the Implementation of a Simple CPU Emulator

Answer 1

+6 A:

Parse the original code into an array of integers. This array is your computer's memory.

Use bitwise operations to extract the various fields. For instance, this:

unsigned int x = 0xfeed;
unsigned int opcode = (x >> 12) & 0xf;

will extract the topmost four bits (0xf, here) from a 16-bit value stored in an unsigned int. You can then use e.g. switch() to inspect the opcode and take the proper action:

enum { ADD = 0 };

unsigned int execute(int *memory, unsigned int pc)
{
  const unsigned int opcode = (memory[pc++] >> 12) & 0xf;

  switch(opcode)
  {
  case OP_ADD:
    /* Do whatever the ADD instruction's definition mandates. */
    return pc;
  default:
    fprintf(stderr, "** Non-implemented opcode %x found in location %x\n", opcode, pc - 1);
  }
  return pc;
}

Modifying memory is just a case of writing into your array of integers, perhaps also using some bitwise math if needed.

unwind 2010-03-02 14:24:28

I was hoping that someone would mention a concept like this. I've never used it before though, so I'll have to do more research. Thank you!

Brandon 2010-03-02 14:46:28

Ahh, memories of university projects!

sdg 2010-03-02 18:09:22

+1. This is the basic approach you should take. The key point here, Brandon, relative to your question, is that in order to approach this normally, you need to get to "machine code" which will be the array of bytes in an array that represent your virtual computer's address space. Then if the instructions edit memory (code), you don't do anything special, you just follow the instructions and they should do the right thing inside your big virtual memory array. IOW, you need both the assembler (the tool that translates text strings into instruction bytes) and the emulator, the thing that executes

quixoto 2010-03-02 18:15:44

Brandon 2010-03-02 23:49:10

Brandon 2010-03-03 01:58:10

@Brandon: The masking (bitwise and) is not strictly necessary, but it adds a certain amount of clarity. It also protects you from things that could happen if, for instance, there were bits set above the 16 lowest in the memory integer. You could squeeze these things out by e.g. using unsigned shorts and so on, but it's easier to be safe.

unwind 2010-03-03 07:40:40

Answer 2

+2 A:

I think the best approach is to read the instructions, convert them to unsigned integers, and store them into memory, then execute them from memory.

Once you've parsed the instructions and stored them to memory, self-modification is much easier than storing a list of changes for each instruction. You can just change the memory at that location (assuming you don't ever need to know what the old instruction was).
Since you're converting the instructions to integers, this problem is moot.
To parse the opcode and operand sections, you'll need to use bit shifting and masking. For example, to get the op code, you mask off the upper 4 bits and shift down by 12 bits (instruction >> 12). You can use a mask to get the operand too.
You mean your machine has instructions that shift bits? That shouldn't affect how you store the operands. When you get to executing one of those instructions, you can just use the C++ bit-shifting operators << and >>.

Nick Meyer 2010-03-02 14:30:32

Regarding 1: In that case, I'd still have to recompile the set of instructions before executing them as opposed to executing them as they're read, correct? That's not a problem, I was just wondering whether there might be a good way to accomplish the non-recompilation method. // Regarding 4: Right, that's what I meant. I thought it'd affect how I'd store the instruction pieces, as I would have to consider each piece separately. But this bit math seems like it could achieve it. // What you've mentioned makes sense conceptually. I'll try it when I get home today. Thanks!

Brandon 2010-03-02 14:44:15

I don't know what you mean by "recompile" here. A (simple) CPU doesn't "recompile" anything: it's got the bits, and it does what they say. Can you clarify what you mean by that?

Ken 2010-03-02 14:58:21

My understanding is that emulators often function by recompiling the instructions to another format that can be more easily processed by the emulator's machine. I'm using the term loosely here, but I think it's the same idea. I was just wondering whether there is an efficient way to execute the instructions without having to store the instructions somewhere else in memory in another form (as unsigned ints, for example).

Brandon 2010-03-02 15:18:00

Brandon 2010-03-03 02:04:30

@Brandon, you're right, the 0xF000 isn't necessary. What it does is mask out the lowest 12 bits, setting them all to 0 -- but then the shift will toss them out anyway. My bad.

Nick Meyer 2010-03-03 02:29:31

Answer 3

+1 A:

Just in case it helps, here's the last CPU emulator I wrote in C++. Actually, it's the only emulator I've written in C++.

The spec's language is slightly idiosyncratic but it's a perfectly respectable, simple VM description, possibly quite similar to your prof's VM:

http://www.boundvariable.org/um-spec.txt

Here's my (somewhat over-engineered) code, which should give you some ideas. For instance it shows how to implement mathematical operators, in the Giant Switch Statement in um.cpp:

http://www.eschatonic.org/misc/um.zip

You can maybe find other implementations for comparison with a web search, since plenty of people entered the contest (I wasn't one of them: I did it much later). Although not many in C++ I'd guess.

If I were you, I'd only store the instructions as strings to start with, if that's the way that your virtual machine specification defines operations on them. Then convert them to integers as needed, every time you want to execute them. It'll be slow, but so what? Yours isn't a real VM that you're going to be using to run time-critical programs, and a dog-slow interpreter still illustrates the important points you need to know at this stage.

It's possible though that the VM actually defines everything in terms of integers, and the strings are just there to describe the program when it's loaded into the machine. In that case, convert the program to integers at the start. If the VM stores programs and data together, with the same operations acting on both, then this is the way to go.

The way to choose between them is to look at the opcode which is used to modify the program. Is the new instruction supplied to it as an integer, or as a string? Whichever it is, the simplest thing to start with is probably to store the program in that format. You can always change later once it's working.

In the case of the UM described above, the machine is defined in terms of "platters" with space for 32 bits. Clearly these can be represented in C++ as 32-bit integers, so that's what my implementation does.

Steve Jessop 2010-03-02 15:44:18

Before I spoke with my professor, I searched for something simple to emulate and found that contest. It was still a bit too complicated for me to start with, but that might be my next step. I will look over your code for ideas. Thank you!

Brandon 2010-03-03 02:06:29

Answer 4

A:

I created an emulator for a custom cryptographic processor. I exploited the polymorphism of C++ by creating a tree of base classes:

struct Instruction  // Contains common methods & data to all instructions.
{
    virtual void execute(void) = 0;
    virtual size_t get_instruction_size(void) const = 0;
    virtual unsigned int get_opcode(void) const = 0;
    virtual const std::string& get_instruction_name(void) = 0;
};

class Math_Instruction
:  public Instruction
{
  // Operations common to all math instructions;
};

class Branch_Instruction
:  public Instruction
{
  // Operations common to all branch instructions;
};

class Add_Instruction
:  public Math_Instruction
{
};

I also had a couple of factories. At least two would be useful:

Factory to create instruction from text.
Factory to create instruction from opcode

The instruction classes should have methods to load their data from an input source (e.g. std::istream) or text (std::string). The corollary methods of output should also be supported (such as instruction name and opcode).

I had the application create objects, from an input file, and place them into a vector of Instruction. The executor method would run the 'execute()` method of each instruction in the array. This action trickled down to the instruction leaf object which performed the detailed execution.

There are other global objects that may need emulation as well. In my case some included the data bus, registers, ALU and memory locations.

Please spend more time designing and thinking about the project before you code it. I found it quite a challenge, especially implementing a single-step capable debugger and GUI.

Good Luck!

Thomas Matthews 2010-03-02 18:07:35

BTW, at my alma mater, one of the professors had his students write functions to emulate hardware components for the *Microprocessor Design* class. We had some good discussions about processor emulation. :-)

Thomas Matthews 2010-03-02 18:11:17

ansaurus

tags:

views:

answers:

Questions Regarding the Implementation of a Simple CPU Emulator

related questions