views:

162

answers:

3

Greetings! Last semester in college, my teacher in the Computer Languages class taught us the esoteric language named Whitespace. In the interest of learning the language better with a very busy schedule (midterms), I wrote an interpreter and assembler in Python. An assembly language was designed to facilitate writing programs easily, and a sample program was written with the given assembly mnemonics.

Now that it is summer, a new project has begun with the objective being to rewrite the interpreter and assembler for Whitespace 0.3, with further developments coming afterwards. Since there is so much extra time than before to work on its design, you are presented here with an outline that provides a revised set of mnemonics for the assembly language. This post is marked as a wiki for their discussion.

Have you ever had any experience with assembly languages in the past? Were there some instructions that you thought should have been renamed to something different? Did you find yourself thinking outside the box and with a different paradigm than in which the mnemonics were named? If you can answer yes to any of those questions, you are most welcome here. Subjective answers are appreciated!


Stack Manipulation (IMP: [Space])

Stack manipulation is one of the more common operations, hence the shortness of the IMP [Space]. There are four stack instructions.

hold N       Push the number onto the stack
copy         Duplicate the top item on the stack
copy N       Copy the nth item on the stack (given by the argument) onto the top of the stack
swap         Swap the top two items on the stack
drop         Discard the top item on the stack
drop N       Slide n items off the stack, keeping the top item

Arithmetic (IMP: [Tab][Space])

Arithmetic commands operate on the top two items on the stack, and replace them with the result of the operation. The first item pushed is considered to be left of the operator.

add          Addition
sub          Subtraction
mul          Multiplication
div          Integer Division
mod          Modulo

Heap Access (IMP: [Tab][Tab])

Heap access commands look at the stack to find the address of items to be stored or retrieved. To store an item, push the address then the value and run the store command. To retrieve an item, push the address and run the retrieve command, which will place the value stored in the location at the top of the stack.

save         Store
load         Retrieve

Flow Control (IMP: [LF])

Flow control operations are also common. Subroutines are marked by labels, as well as the targets of conditional and unconditional jumps, by which loops can be implemented. Programs must be ended by means of [LF][LF][LF] so that the interpreter can exit cleanly.

L:           Mark a location in the program
call L       Call a subroutine
goto L       Jump unconditionally to a label
if=0 L       Jump to a label if the top of the stack is zero
if<0 L       Jump to a label if the top of the stack is negative
return       End a subroutine and transfer control back to the caller
halt         End the program

I/O (IMP: [Tab][LF])

Finally, we need to be able to interact with the user. There are IO instructions for reading and writing numbers and individual characters. With these, string manipulation routines can be written. The read instructions take the heap address in which to store the result from the top of the stack.

print chr    Output the character at the top of the stack
print int    Output the number at the top of the stack
input chr    Read a character and place it in the location given by the top of the stack
input int    Read a number and place it in the location given by the top of the stack

Question: How would you redesign, rewrite, or rename the previous mnemonics and for what reasons?

+4  A: 

I think the first change I'd propose is changing hold and drop to push and pop respectively.

Then maybe I'd rename copy to dup (I think that's the most common name for this operation in stack oriented languages).

I'm a little puzzled why often you have short one word explanations that are different to the mnemonic. For example mnemonic is Save, explanation is Store. Mnemonic is Load, explanation is Retrieve. Incidentally those are the two mnemonics that aren't sufficiently explained to me. Save what where? Load what from where? (Edit the question has subsequently been edited to make these meanings clear)

Thanks for the interesting post.

Bill Forster
The explanations come from the tutorial found here: http://compsoc.dur.ac.uk/whitespace/tutorial.php
Noctis Skytower
+1  A: 

I'm not sure I completely understand your question, so if I'm off base, forgive me.

In addition to your stack, I would probably add a "status register" that contains a variety of different flags (like Carry, Overflow, and Zero) that are set by the arithmatic operators.

I would then add "if" forms that test those flags.

I would add bit shift and rotate (both left and right) instructions, as well as AND/OR/XOR/NOT operations that operate on bits.

You will most likely want to have some sort of memory access, unless you intend the I/O instructions to treat memory as a stream of values for that good ol' fashioned Turing Machine feel.

hdan
Since I did not write the language and intend to currently just rewrite the interpreter for version 0.3 of the language, options for changing it are limited. If all goes well, I may take the liberty to make small modifications to instruction set and focus primarily on the assembly code for extension version 0.4 of the language. Continuing on, version 0.5 would probably concentrate on the instruction and involve a much larger overhaul of Whitespace assembly. In addition, moving the language from using three whitespace characters to all six whitespace characters is a small additional objective.
Noctis Skytower
+2  A: 
  • push #n, to make it clear that n is an immediate.
  • "swap" is sometimes "exc" or "exch" I think.
  • "save" is usually "st" (store)
  • "load" is usually "ld"
  • "call" could also be "jsr" or "bl".
  • "goto" is usually "jmp" or "bra"
  • "if=0" is usually "beq"
  • "if<0" is usually "blt"
  • "return" is usually "ret" or "blr"
  • "exit" is usually "halt"/"hlt" in the context of a CPU.
  • "print chr" and "print int" could be "print.c" and "print.i". There are many ways to specify instruction variants, but usually it's not in the operands.

EDIT:

If you don't mind conflating opcodes and addressing modes, using CISCy syntax,

  • "push (sp)" instead of "copy"
  • "push N(sp)" instead of "copy N" (modulo multiplying by the word size)
  • "push *(sp)" instead of "load" (except it does a pop before pushing the loaded values)
  • "pop *1(sp)" instead of "push" (except it actually pops twice)

On the other hand, stack-based code usually treats push and pop as implicit. In that case, "imm n" (immediate) instead of "push". Then all stack operations are purely stack operations, which is nice and consistent.

I'm not sure how I'd write "drop N" — the description makes it sound like "drop 1" isn't equivalent to "drop" which seems odd.

tc.
Thank you! The hold instruction was original push, and the drop instruction was originally away. Pop was considered, but neither push nor pop describe the operation very well (even though they are standard). Your suggestion on changing exit to halt makes sense. "print chr" is an instruction: it has no operand. Maybe studying 4D is getting to me. They allow spaces in their instructions, weirdly. :)
Noctis Skytower
It depends on which paradigm you're using. The x87 has "fstp" which means "floating point store and pop", i.e. storing is orthogonal to popping (a lot of x87 instructions have "and pop" variants). Adding an edit...
tc.
From Wapedia: `In Unix halt is the command to shut down the computer. In x86 assembly language, HLT is an instruction that halts the CPU until the next external interrupt is fired.` That may come in handy later on when Whitespace is developed into further versions. Programming an interrupt system into the language would be a great learning experience.
Noctis Skytower
On some systemss, there's a distinction between {{halt}} and {{poweroff}}.
tc.