tags:

views:

707

answers:

6

I'm interested in programming a virtual machine, nothing as fancy as virtualbox or vmware, but something that can emulate a simple architecture, be it cisc or risc, say the Zilog, SPARC, MIPS, or 80686 architecture models.

I suppose by doing this it would be relatively simple to make an emulator of the same kind, I'm just interested in using this for experience more than anything else (being my first C project, I'd rather do this in C than in anything else)

Thanks for your tips!

+1  A: 

This is not a product endorsement, but an observation...

I'd pick up a Deitel and Deitel book to start with. (probably this one if you're looking to do it in C) They always seem to have one chapter on making a virtual machine, along with some instructions for writing assembler code for your virtual machine, regardless of the language they're teaching.

Edit - Added

(although I'd check it out at a library before buying it in case I'm misunderstanding what it is you want to write)

David Stratton
Thanks, very nice tip; shows grand levels of experience!
whoozat
Just to be a smart aleck: Those names are spelled with E before I, i.e. "Deitel". Knowing that will make it easier to look them up :)
Carl Smotricz
Thanks @Carl Smotricz, for the catch in my spelling, and thanks @Grahamn Lee for editing and correcting it!
David Stratton
+3  A: 

Check what others have done in this area!

A good way to pick-up info about a particular type of application (and in your case a good way to pick up c idioms as well) is by looking at the structure and the detail of an open source project of the same type. One may decide to merely peek, briefly review and then "forget", in order to start one's very own project from scratch, but in all cases this type of visit is beneficial.

Since you mentioned "simple architecture" and Zilog, I figured the Z80 processor could be a good match. For various reasons there are many current and past projects in the Z80 emulator genre. BTW, one of the reasons is that many older slot-type video consoles were running on Z80, which prompted nostalgic gamers to write emulators to run their old favorites ;-)

An example of such a project is YAZE-AG which includes both a full Z80 emulator as well as C/PM. The whole thing is written in C. Also it is relatively mature (Version 2.x) and active. I'm guessing this is the work of a very small team (possibly of one ;-) ).

Good luck!

mjv
+1. GREAT answer!
David Stratton
+1  A: 

A few thoughts:

  • The older instruction sets will be simpler, so they might be a good place to start.
  • Choose a risc architecture: decoding the instruction stream will be much easier.
  • Ignore things like interrupts, NMIs, etc.
  • There's always a lot of fiddly detail in accurately emulating booting. Instead, pick something very simple like starting execution at address zero, with all registers set to zero.
  • Real programs will need things like real hardware emulation as well, so don't do that.
  • You might want to extend the instruction set with a few special i/o instructions to read a number, write a character (or even a string), etc, so that you can write simple test programs that actually do very simple i/o.
  • Parsing an object file format like elf can be a lot of work all by itself. With tools like objdump, you can probably extract just the text section (ie, the instructions) as binary (at least ascii hex).
  • Start by writing a disassembler for whatever instruction set you want to emulate, or at least for the initial subset you'll shoot for. You'll need it for debugging anyway.
  • Find out how to get gas (gnu assembler) for your chosen instruction set working so you can produce known good object files and test programs.

Unless you have knowledge of other programming languages, and/or a reasonable understanding of assembler, this is a pretty challenging first C project. Never the less, good luck!

Dale Hagglund
MIPS and SPARC both have extensive treatment as "teaching" architectures in both literature and on the Internet, so you might go with those over something complicated like x86.
Steven Schlansker
+1  A: 

A common exercise is to build a straightforward calculator. It has only a limited number of operations (typically 4, * / + -), one datatype (number) and you probably have a very good understanding of how it should work. That makes debugging a lot easier.

Despite the simplicity, you already have to deal with some fundamental VM problems. You need to parse an sequence of command, store multiple objects you're working on, and deal with output.

Coincidentally, calculator ICs are the forerunners of CPUs, so this approach also makes sense from a historical perspective.

MSalters
+1  A: 

Something from the Zilog era would be good because you can probably find some software that ran on real Z-80 machines and use that as a final test.

The first real program that I wrote (other than one page class assignments) was an emulator for the HP2100A minicomputer that I had used in high-school. I wrote that in B, the predecessor of C, and I don't think that this is too hard for a first C program. If anything, it might be too simple. Of course something like the 80686 is far more challenging than a Z-80 but it's already been done by QEMU, VirtualBox, and others.

The hardest part of this will be the whole interupt system that connects the machine to the external world.

You might want to read up about LLVM and decide whether you really want to write a VM or an emulator.

Michael Dillon
A: 

If you are designing a CPU and emulating it,

get the core ready. Meaning, write classes for registers. Write one for flags. Write a memory controller.

Think about the type of opcodes. Also, what is the length of words? Is it a 16-bit CPU? 8-bit?

What type of memory access do you want to use? DMA? HDMA?

What type of interrupts do you want to support? Will the CPU be a learning platform? Will it just be a CPU and some memory, or will it actually have devices connected to it? (sound, video, etc).

Here is some code from my emulator that I am working on (public domain). Been working on it for a few days. About 3200 lines of code so far (most of is microcode.cs, which isn't posted here due to it's 2600 lines in size).

[code]using System;

namespace SYSTEM.cpu { // NOTE: Only level-trigger interrupts are planned right now

// To implement:
// - microcode
// - execution unit
// - etc

// This is the "core"; think of the CPU core like a building. You have several departments; flags, memory and registers
// Microcode is external

class core
{
    public cpu_flags flags;
    public cpu_registers registers;

    public cpu_memory memory;

    public core(byte[] ROM, byte[] PRG)
    {
        flags = new cpu_flags();
        registers = new cpu_registers();

        memory = new cpu_memory(ROM, PRG);

        return;
    }
}

} [/code]

[code]

using System;

namespace SYSTEM.cpu { class cpu_flags { // SYSTEM is not a 6502 emulator. The flags here, however, are exactly named as in 6502's SR // They do NOT, however, WORK the same as in 6502. They are intended to similar uses, but the only identity is the naming.

    // I just like the 6502's naming and whatnot.

    // This would otherwise be a register in SYSTEM.cpu_core.cpu_registers. SR, with the bits used correctly.
    // This would be less readable, code-wise, so I've opted to dedicate an entire CLASS to the status register

    // Though, I should implement here a function for putting the flags in a byte, so "SR" can be pushed when servicing interrupts

    public bool negative, // set if the high bit of the result of the last operation was 1

        // bit 7, then so on
            overflow, // says whether the last arithmetic operation resulted in overflow (NOTE: No subtraction opcodes available in SYSTEM)

            // NO FLAG

            brk, // break flag, set when a BREAK instruction is executed

            // NO FLAG (would be decimal flag, but I don't see why anyone would want BCD. If you want it, go implement it in my emulator; in software)
                // i.e. don't implement it in SYSTEM; write it in SYSTEM ASM and run it in SYSTEM's DEBUGGER

            irq,    // whether or not an interrupt should begin at the next interrupt period (if false, no interrupt)

            zero, // says whether the last arithmetic operation resulted in zero

            carry; // set when alpha rolls from 0xFFFF to 0x0000, or when a 1 is rotated/shifted during arithmetic

    public cpu_flags()
    {
        negative = true; // all arithmetic registers are FFFF by default, so of course they are negative

        overflow = false; // obviously, because no arithmetic operation has been performed yet

        brk = false;

        irq = true; // interrupts are enabled by default of course

        zero = false; // obviously, since all arith regs are not zero by default

        carry = false;  // obviously, since no carry operation was performed

        return;
    }

    // Explain:

    // These flags are public. No point putting much management on them here, since they are boolean

    // The opcodes that SYSTEM supports, will act on these flags. This is just here for code clarity/organisation

}

} [/code]

[code]

using System;

// This implements the memory controller

// NOTE: NO BANK SWITCHING IMPLEMENTED, AND NOT PLANNED AT THE MOMENT, SO MAKE DO WITH TEH 64

// SYSTEM has a 16-bit address bus (and the maximum memory supported; 64K) // SYSTEM also has a 16-bit data bus; 8-bit operations are also performed here, they just use the low bits

// 0x0000-0x00FF is stack // 0xF000-0xFFFF is mapped to BIOS ROM, and read-only; this is where BIOS is loaded on startup. // (meaning PROGRAM ROM can be up to 4096B, or 4K. Normally this will be used for loading a BIOS) // Mapping other PROGRAM ROM should start from 0x0100, but execution should start from 0xF000, where ROM/BIOS is mapped

// NOTE: PROGRAM ROM IS 32K, and mapped from 0x0100 to 0x80FF

// ;-)

namespace SYSTEM.cpu { class cpu_memory { // to implement: // device interaction (certain addresses in ROM should be writeable by external device, connected to the controller) // anything else that comes to mind.

    // Oh, and bank switching, if feasible

    private byte[] RAM; // As in the bull? ...

    public cpu_memory(byte[] ROM, byte[] PRG)
    {
        // Some code here can be condensed, but for the interest of readability, it is optimized for readability. Not space.

        // Checking whether environment is sane... SYSTEM is grinning and holding a spatula. Guess not.
        if(ROM.Length > 4096) throw new Exception("****SYSINIT PANIC****: BIOS ROM size INCORRECT. MUST be  within 4096 BYTES. STOP");

        if (PRG.Length > 32768) throw new Exception("****SYSINIT PANIC**** PROGRAM ROM size INCORRECT. MUST be within 61184 BYTES. STOP");

        if(ROM.Length != 4096) // Pads ROM to be 4096 bytes, if size is not exact
        {                       // This would not be done on a physical implementation of SYSTEM, but I feel like being kind to the lazy
            this.RAM = ROM;
            ROM = new byte[4096];
            for(int i = 0x000; i < RAM.Length; i++) ROM[i] = this.RAM[i];
        }

        if(PRG.Length != 32768) // Pads PRG to be 61184 bytes, if size is not exact
        {                   // again, being nice to lazy people..
            this.RAM = PRG;
            PRG = new byte[32768];
            for(int i = 0x000; i < RAM.Length; i++) PRG[i] = RAM[i];
        }

        this.RAM = new byte[0x10000]; // 64K of memory, the max supported

        // Initialize all bytes in the stack, to 0xFF
        for (int i = 0; i < 0x100; i++) this.RAM[i] = 0xFF; // This is redundant, but desired, for my own undisclosed reasons.

    // LOAD PROGRAM ROM AND BIOS ROM INTO MEMORY

        for (int i = 0xf000; i < 0x10000; i++)  // LOAD BIOS ROM INTO MEMORY
        {
            this.RAM[i] = ROM[i - 0xf000]; // yeah, pretty easy actually
        }

        // Remember, 0x0100-0x80FF is for PROGRAM ROM

        for (int i = 0x0100; i < 0x8100; i++) // LOAD PROGRAM ROM INTO MEMORY
        {
            this.RAM[i] = PRG[i - 0x100]; // not that you knew it would be much different
        }

        // The rest, 0x8100-0xEFFF, is reserved for now (the programmer can use it freely, as well as where PRG is loaded).
        // still read/writeable though

        return;
    }

// READ/WRITE:

    // NOTE: SYSTEM's cpu is LITTLE ENDIAN
    // WHEN DOUBLE-READING, THE BYTE-ORDER IS CONVERTED TO BIG ENDIAN
    // WHEN DOUBLE-WRITING, THE BYTE TO WRITE IS BIG ENDIAN, AND CONVERTED TO LITTLE ENDIAN

    // CPU HAS MAR/MBR, but the MEMORY CONTROLLER has ITS OWN REGISTERS for this?

// SINGLE OPERATIONS

    public byte read_single(ref cpu_registers registers, ushort address) // READ A SINGLE BYTE
    {                               // reading from any memory location is allowed, so this is simple
        registers.memoryAddress = address;
        return registers.memoryBuffer8 = this.RAM[registers.memoryAddress];

    }

    public ushort read_double(ref cpu_registers registers, ushort address) // READ TWO BYTES (converted to BIG ENDIAN byte order)
    {
        ushort ret = this.RAM[++address];
        ret <<= 8;
        ret |= this.RAM[--address];

        registers.memoryAddress = address;
        registers.memoryBuffer16 = ret;

        return registers.memoryBuffer16;
    }

    public void write_single(ref cpu_registers registers, ushort address, byte mbr_single) // WRITE A SINGLE BYTE
    {
        if (address < 0x0100) return; // block write to the stack (0x0000-0x00FF)
        if (address > 0xEFFF) return; // block writes to ROM area (0xF000-0xFFFF)

        registers.memoryAddress = address;
        registers.memoryBuffer8 = mbr_single;

        this.RAM[registers.memoryAddress] = registers.memoryBuffer8;

        return;
    }

    public void write_double(ref cpu_registers registers, ushort address, ushort mbr_double) // WRITE TWO BYTES (converted to LITTLE ENDIAN ORDER)
    {
        // writes to stack are blocked (0x0000-0x00FF)
        // writes to ROM are blocked   (0xF000-0xFFFF)

        write_single(ref registers, ++address, (byte)(mbr_double >> 8));
        write_single(ref registers, --address, (byte)(mbr_double & 0xff));

        registers.memoryBuffer16 = mbr_double;
        return;
    }

    public byte pop_single(ref cpu_registers registers) // POP ONE BYTE OFF STACK
    {
        return read_single(ref registers, registers.stackPointer++);
    }

    public ushort pop_double(ref cpu_registers registers) // POP TWO BYTES OFF STACK
    {
        ushort tmp = registers.stackPointer++;          ++registers.stackPointer;
        return read_double(ref registers, tmp);
    }

// PUSH isn't as easy, since we can't use write_single() or write_double()
// because those are for external writes and they block writes to the stack
// external writes to the stack are possible of course, but
    // these are done here through push_single() and push_double()

    public void push_single(ref cpu_registers registers, byte VALUE) // PUSH ONE BYTE
    {
        registers.memoryAddress = --registers.stackPointer;
        registers.memoryBuffer8 = VALUE;

        this.RAM[registers.memoryAddress] = registers.memoryBuffer8;
        return;
    }

    public void push_double(ref cpu_registers registers, ushort VALUE) // PUSH TWO BYTES
    {
        this.RAM[--registers.stackPointer] = (byte)(VALUE >> 8);
        this.RAM[--registers.stackPointer] = (byte)(VALUE & 0xff);

        registers.memoryAddress = registers.stackPointer;
        registers.memoryBuffer16 = VALUE;

        return;
    }
}

} [/code]

[code]using System;

namespace SYSTEM.cpu { // Contains the class for handling registers. Quite simple really.

class cpu_registers
{
    private byte sp, cop; // stack pointer, current opcode
    //

    private ushort pp, ip, // program pointer, interrupt pointer
        mar, mbr_hybrid; // memory address and memory buffer registers,
                    // store address being operated on, store data being read/written
                    // mbr is essentially the data bus; as said, it supports both 16 and 8 bit operation.

                    // There are properties in this class for handling mbr in 16-bit or 8-bit capacity, accordingly
                    // NOTE: Paged memory can be used, but this is handled by opcodes, otherwise the memory addressing
                    //       is absolute

                    // NOTE: sp is also an address bus, but used on the stack (0x0000-0x00ff) only
                    // when pushing to the stack, or pulling, mbr gets updated in 8-bit capacity



                    // For pulling 16-bit word from stack, shifting register 8 left is needed, otherwise the next 
                    // POP operation will override the result of the last

    // Alpha is accumulator, the rest are general purpose
    public ushort alphaX, bravoX, charlieX, deltaX;

    public cpu_registers()
    {
        sp = 0xFF;  // stack; push left, pop right
        // stack is from 0x0000-0x00ff in memory
        pp = 0xf000; // execution starts from 0xf000; ROM is loaded
        // from 0xf000-0xffff, so 4KB of ROM. 
        // 0xf000-0xffff cannot be written to in software; though this disable
        // self-modifying code, effectively.

        ip = pp; // interrupt pointer starts from the same place as pp

        alphaX = bravoX = charlieX = deltaX = 0xffff;

        cop = 0x00; // whatever opcode 0x00 is, cop is that on init

        mar = mbr_hybrid = 0x0000;

        return;
    }

    // Registers:

    public ushort memoryAddress // no restrictions on read/write, but obviously it needs to be handled with care for this register
    {                       // This should ONLY be handled by the execution unit, when actually loading instructions from memory
        set { mar = value; }
        get { return mar; }
    }

// NOTE: 8-bit and 16-bit address bus are shared, but address bus must have all bits written.
// when writing 8-bit value, byte-signal gets split. Like how an audio/video splitter works.

    public byte memoryBuffer8 // treats address bus as 8-bit, load one byte
    {
        set {   // byte is loaded into both low and high byte in mbr (i.e. it is split to create duplicates, for a 16-bit signal)
            mbr_hybrid &= 0x0000;   
            mbr_hybrid |= (ushort)value;
            mbr_hybrid <<= 0x08;
            mbr_hybrid |= (ushort)value;
        } get {
            return (byte)mbr_hybrid;
        }
    }

    public ushort memoryBuffer16 // treats address bus as 16-bit, load two bytes
    {
        set {
            mbr_hybrid &= 0x0000;
            mbr_hybrid |= value;
        } get {
            return mbr_hybrid;
        }
    }

    public byte stackPointer // sp is writable, but only push/pull opcodes
    {                        // should be able to write to it. There SHOULD
        set { sp = value; }  // be opcodes for reading from it
        get { return sp; }
    }

    public byte currentOpcode
    {
        set { cop = value; }
        get { return cop; }
    }

    public ushort programPointer // says where an instruction is being executed from
    {
        set { pp = value; }
        get { return pp; }
    }

    public ushort interruptPointer // says where the next requested interrupt should begin 
    {                   // (copied into PP, after pushing relevant registers)
        set { ip = value; }
        get { return ip; }
    }

    public byte status(cpu_flags flags) // status word, containing all flags
    {
        byte ret = 0;
        if (flags.negative) ret |= 0x80;
        if (flags.overflow) ret |= 0x40;
        if (flags.brk) ret |= 0x10;
        if (flags.irq) ret |= 0x04;
        if (flags.zero) ret |= 0x02;
        if (flags.carry) ret |= 0x01;

        return ret;
    }

}

} [/code]

[code]using System;

using System.Collections.Generic;

namespace SYSTEM.cpu { class cpu_execution { public core processor; // the "core", detailing the CPU status, including memory, memory controller, etc public cpu_microcode microcode; // the microcode unit (note, microcode is plug and play, you could use something else here)

    public cpu_execution(byte[] ROM, byte[] PRG) // initialize execution unit and everything under it
    {
        processor = new core(ROM, PRG);
        microcode = new cpu_microcode();

        return;
    }

    public void fetch() // fetch current instruction
    {
        processor.registers.currentOpcode = processor.memory.read_single(ref processor.registers, processor.registers.programPointer);
        return;
    }

    public void execute() // execute current instruction
    {
        processor = microcode.use(processor);
        return;
    }



}

} [/code]

microcode.cs, which emulates the opcode, is not included here because it's 2600 lines of code.

All of this is C#.

Frank