views:

1454

answers:

12

I've just tried to create the smallest possible language interpreter. Would you like to join and try?

Rules of the game:

  • You should specify a programming language you're interpreting. If it's a language you invented, it should come with a list of commands in the comments.
  • Your code should start with example program and data assigned to your code and data variables.
  • Your code should end with output of your result. It's preferable that there are debug statements at every intermediate step.
  • Your code should be runnable as written.
  • You can assume that data are 0 and 1s (int, string or boolean, your choice) and output is a single bit.
  • The language should be Turing-complete in the sense that for any algorithm written on a standard model, such as Turing machine, Markov chains, or similar of your choice, it's reasonably obvious (or explained) how to write a program that after being executred by your interpreter performs the algorithm.
  • The length of the code is defined as the length of the code after removal of input part, output part, debug statements and non-necessary whitespaces. Please add the resulting code and its length to the post.
  • You can't use functions that make compiler execute code for you, such as eval(), exec() or similar.

This is a Community Wiki, meaning neither the question nor answers get the reputation points from votes. But vote anyway!

+2  A: 

It's not an answer but for those interested.

The Shortest Universal Turing Machine Implementation Contest

Ólafur Waage
Then you should submit :)
Ólafur Waage
Sorry, I was certainly too optimistic when I said I'm beating those. It's very language-dependent and I don't count input/output.
ilya n.
They are trying to implement the "Turing machine language". I am trying to implement "any language you like provided it's Turing-complete", which by definition leads to shorter code ;)
ilya n.
Actually, I wrote a Python version... I'll send it soon :)
ilya n.
Shouldn't this be a comment?
Wallacoloo
+8  A: 

Python program that interprets a language I just created:

from random import randint
z = [randint(0,1), None]  # example: input data is 0 or 1
x = '_b_ed_a_i____d_ai'   # this program will set p = data[1] = not data[0]

# input x[i], data z[k]
# jumper j
# p is +1 or -1 each step
# commands:
#    a   set data = 0 or 1
#    b   j += 0 or +-9 depending on data
#    d   move data left or right
#    e   perform jump left or right
#    j   set jumper to 0
#    i   end program
# output: p or some data[x], both possible

g = globals()
p = i = 1
a = b = d = j = e = k = 0
while i:
    h = x[i]
    g[h] = p = -p
    i += 1 + j*e
    if a: z[k] = i % 2
    if z[k]: j += 9*b
    k += d
    g[h] = 0        
#    print(a, b, d, e, i, j, k, h, z)

print('Input:', z, 'Output:', p > 0)

Optimized version:

g=globals()
p=i=1
a=b=d=j=e=k=0
while i:
 h=x[i]
 g[h]=p=-p
 i+=1+j*e
 if a:z[k]=i%2
 if z[k]:j+=9*b
 k+=d
 g[h]=0

114 bytes

Update: I want to add that the point of my program is not to create any practical language, and not even to somehow have the 'best' (=='shortest') interpreter, but rather to demonstrate interesting programming tricks. For example, I am relying on direct access to global variables through globals(), so that I never even test for j command, saving precious bytes :)

ilya n.
A: 

I would imagine that a brainfuck interpreter would be pretty easy to write. ;)

Charles Ma
Yes, but it would be longer than interpreter of my language :)) So my language is like brainfuck^2
ilya n.
I thought what was impressive with Brainfuck was the size of the *compiler*. 240 bytes. Which I think was way smaller than the FALSE compiler.
Nosredna
It's cool, but compiler is much more dependent into what architecture you compile...
ilya n.
+12  A: 

Python, 250 bytes.

This one is longer than ilya n.'s Python solution, but the language is a little easier to grasp. Each command in this language (I call it Kaputt, the German word for "broken") is one byte. The following 6 commands are defined:

  • 0: Push a zero onto the stack
  • 1: Push a one onto the stack
  • I: (if) Pop a value off the stack (which must be zero or one). Run the following block of code (until "i") if it's a one; skip the block if it's a zero.
  • i: (endif) Ends the if block and pushes a one if the block was not run (because the "I" popped a zero). See below for an explanation of the latter.
  • D: (def) Pops the name of the to-be-defined function off the stack (see below) and binds the following block (until "d") to that name.
  • d: (enddef) Ends the function definition.
  • Any other byte: Check if there is a function by this (one-byte; but see the edit below) name. If so, run this function; if not, push this byte onto the stack for immediate use by D.

Nested ifs are legal; nested function definitions are not. The fact that i (endif) pushes a one if and only if the corresponding if block was not run allows for the following idiom resembling an if/else/endif structure:

# [code that left a one or zero on the stack]
I    # abbreviated "(" below
# [code to be run if it was a one]
0iI  # abbreviated "/" below
# [code to be run if it was a zero]
1iIi # abbreviated ")" below
# [continuing...]

Note that comments, linebreaks, spaces etc. are not actually allowed.

Here are some examples of common functions. These make use of the abbreviations ( / ) mentioned above.

<D(/)d

defines the function < (pop) that pops a value off the stack without using it for anything.

&D((1/0)/<0)d

defines the function & (and) that pops two values of the stack and pushes a one if both values are ones, pushes a zero otherwise.

~D((11/10)/(01/00))d

defines the function ~ (swap) that swaps the top two values on the stack.

RD(R/<)d

defines the function R (remove) that recursively removes all ones from the top of the stack, and then removes two more values (the top one of which should be zero).

The following interpreter function expects the program p, which is a string (or any other iterable of bytes), and the input stack S, which is a (possibly empty) list of bytes. After the interpreter has run, this list contains the output stack.

def i(p,S,F=0):
 A=S.append
 F=F or{}
 C=0
 k=[]
 for c in p:
  h=0in k
  if c=="d":C=0
  elif C!=0:C+=[c]
  elif c=="I":k+=[int(h or S.pop())]
  elif c=="i":k.pop()or A("1")
  elif 1-h:
   if c=="D":C=F[S.pop()]=[]
   else:i(F.get(c)or A(c)or"",S,F)

Obviously, there was no room for error checking, so i() expects legal Kaputt code. Test cases:

script = "<D(/)d"                 # < = pop
script += "&D((1/0)/<0)d"         # & = and
script += "~D((11/10)/(01/00))d"  # ~ = swap
script += "RD(R/<)d"              # R = remove
script += "|D(<1/(1/0))d"         # | = or
script=script.replace("(","I")   
script=script.replace("/","0iI")
script=script.replace(")","1iIi") # ( and / and ) are no real commands of the language.

S=[]
i(script+"1111010111111111R",S)
assert S == ["1","1","1","1","0"]

S=[]
i(script+"00&",S)
assert S == ["0"]

S=[]
i(script+"01~",S)
assert S == ["1","0"]

S=[]
i(script+"00|",S)
assert S == ["0"]

S=[]
i(script+"01|",S)
assert S == ["1"]

Happy coding :-)

Edit: There is nothing inherent in the interpreter that forces tokens to be one byte only. Requiring this was more for consistency with the built-in commands (which are one-byte because that makes the interpreter shorter). If you pass a list of strings instead of a string, you can write more readable Kaputt code like this:

script = """
inc D
    (
        (
            0 0
        /
            1 0
        )
    /
        1
    )
d
""".replace("(","I").replace("/","0 i I").replace(")","1 i I i").split()

This defines a function inc that increments the two-bit number on top of the stack (LSB on top).

Test:

for n in xrange(4):
    S=[]
    i(script + [str((n & 2)/2), str(n & 1), "inc"], S)
    assert S == [str(((n+1) & 2)/2), str((n+1) & 1)]

Let's call multi-byte function names a language extension :-)

balpha
Nice! Especially an idea of using '(', '/' and ')'.
ilya n.
+1 for the good explanation
SealedSun
+6  A: 

Assemble the code below using A86 to get a 150 byte BF interpreter!

 add dh,10h
 push dx
 add dh,10h
 push dx
 mov bl,80h
 lea dx,[bx+2]
 add bl,byte ptr [bx]
 mov byte ptr [bx+1],al
 mov ah,3dh
 int 21h
 pop ds,es
 jc l14
 mov bx,ax
 mov ah,3fh 
 mov cx,di
 xor dx,dx
 int 21h
 jc l14
 mov bx,ax
 xor ax,ax
 rep stosw
 xor di,di
 xor si,si
 inc ch
l1:
 cmp si,bx
 jae l14
 lodsb
 mov bp,8
 push l1
l2: 
 dec bp
 js ret
 cmp al,[bp+l15]
 jne l2
l3:
 mov cl,[bp+8+l15]
 jmp cx
l4:
 inc di 
 ret
l5:
 dec di
 ret
l6:
 es inc byte ptr [di]
 ret
l7:
 es dec byte ptr [di]
 ret
l8:
 mov ah,2
 es mov dl,[di]
 int 21h
 ret
l9:
 mov ah,8
 int 21h
 es mov [di],al
 ret
l10:
 es cmp byte ptr [di],dh
 jne ret
l11: 
 cmp si,bx
 jae l14
 lodsb
 cmp al,']'
 jne l11
 ret
l12:
 es cmp byte ptr [di],dh
 je ret
l13: 
 dec si
 jz l14
 cmp byte ptr [si-1],'['
 jne l13
l14:
 ret
l15:
 db '>'
 db '<'
 db '+'
 db '-'
 db '.'
 db ','
 db '['
 db ']'
 db offset l4-100h
 db offset l5-100h
 db offset l6-100h
 db offset l7-100h
 db offset l8-100h
 db offset l9-100h
 db offset l10-100h
 db offset l12-100h

Specify a filename on the command line, no need for double quotes, as the BF source.

Skizz

Skizz
Is this something the built-in Windows assembler can assemble?
erjiang
If you mean the debug command, it can be done if you work out the addresses of the labels.
Skizz
In addition to the above, there is a link in the post to a site that has an excellant assembler. The 16bit version is shareware, you have to pay for the 32bit version. You only need the 16bit version for the above.
Skizz
+2  A: 

Building on my previous code-golf entry, here is a (slightly generalized for IO) OISC emulator in

Fortran77

Obsfucated and without the loading scaffold (655 characters):

  subroutine o(n,m)
  integer m(n)              
  l=1;                      
  do while (l.ne.0)
     i=m(l)
     j=m(l+1)
     k=m(l+2)
     mi=mg(n,m,i)
     mj=mg(n,m,j)
     if(j.eq.(n+2)) then
        write(6,*)mj-mi
     else
        m(l+1)=mj-mi
     endif
     if (m(l+1).lt.0) then
        l=mg(n,m,k)
     else
        l=l+3
     endif
  end do
  return
  end
  function mg(n,m,i)
  integer m(n)
  if (i.eq.n+2) then
     read(5,*)mg
  elseif (i.eq.n+1) then
     mg=0
  else
     mg=m(i)
  endif
  return
  end

with comments, loading scaffold and so on (2435 characters):

      program c
      parameter (n=1024)        ! The size to use for memeory
      integer m(n)              ! represent the memory
c     Load a program into memory
      i=1
 1    read(5,*,end=2)l
c     write (6,*) "Load ",l," into location ",i
      m(i)=l
      i=i+1
      goto 1
c     Run the computer
 2    call o(n,m)
      stop
      end

      subroutine o(n,m)
c     Simulate a simple computer that supports only a single
c     instruction. Memory is a fixed size array of integers.
c
c     The supported instruction is subtract-branch-negative which we
c     give the assembly syntax
c
c     sbn i j k
c
c     and acts by subtracting the value in memeory location i from that
c     in location j and storing the result in j. If the result is
c     negative, the PC is set to k. Because there is only one opcode, it
c     is not represented, so a program consists simply of a series of
c     triplet (i,j,k), and the PC is generally incremented by 3. The
c     program counter starts at 1
c
c     Povisions for IO and halting are by way of special memory
c     location:
c
c     * PC=0 means halt
c     * writes (opcode argument j) to memory location n+1 send
c       output, reads from n+1 evaluate as 0
c     * reads (opcode argument i, j, k) from memory location n+2 fetch
c       input, writes to n+2 are forbidden
c     * All others reads and writes outside of memeory are forbidden

c     n                         ! the size of the computers memory 
      integer m(n)              ! an array representing the computers memory
      l=1;                      ! program counter
      do while (l.ne.0)
         i=m(l)
         j=m(l+1)
         k=m(l+2)
c        write (6,*) "Executing from PC=",l,": (",i,", ",j,", ", k,")"
c        handle the write special case for output
         mi=mg(n,m,i)
         mj=mg(n,m,j)
         if(j.eq.(n+1)) then
            write(6,*)mj-mi
         else
c           write (6,*) "Setting location ",j," to ",mj-mi
            m(l+1)=mj-mi
         endif
         if (m(l+1).lt.0) then
            l=mg(n,m,k)
         else
            l=l+3
         endif
      end do
      return
      end

c     Handle the various read special cases
      function mg(n,m,i)
      integer m(n)
      if (i.eq.n+2) then
         read(5,*)mg
      elseif (i.eq.n+1) then
         mg=0
      else
         mg=m(i)
      endif
c     write (6,*) "Read ",mg," from location ",i
      return
      end

Sample program:

13
1025
0

14 
1025
0

14
15
0

0
0
0

-1
1
0

resulting in output:

$ cat trivial.oisc | ./a.out 
           1
          -1

which is as expected.

It really is exceedingly straightforward code. The point here is not really how tightly I can code it, but how simple a language you need for Turing completeness.

dmckee
+4  A: 

Not mine, but take a look at the 210 bit binary lambda calculus self-interpreter here.

Brian
A: 

Custom language: SLA
Keywords:
i - Interpret SLB q - Quit

Custom Language: SLB
Keywords:
cp("text") - Interpret text as C program

Example:
cp("#include \nint main() { puts(\"Hi!\n\");return 0}")

Interpreter (Written in SLA): i q

3 characters!

computergeek6
funny, though "You can't use functions that make compiler execute code for you, such as eval(), exec() or similar."
ilya n.
+6  A: 
#include "/dev/tty"
Joshua
+1 for a very funny answer, though "You can't use functions that make compiler execute code for you, such as eval(), exec() or similar."
ilya n.
This doesn't make the compiler execute code, the shell is executing the code. ;)
Joe D
+1  A: 

106-byte solution was posted to codegolf.com contest. It is written in perl and interprets Brainfuck. There is no way to review it at this point, afaik.

It is not my solution, but I believe it is shorter than current entries. Possible solution should be shorter, as there is no need to use Brainfuck as Turing-complete language.

samuil
+1  A: 

Written in Perl, 140 characters long, including the shell invocation and flags:

perl -ne'push@a,split;if(eof){$i=0;while($i>=0){($a,$b,$c)=@a[$i..$i+2];$b>-1?$a[$b]-=$a[$a]:print chr$a[$a];$i=$b>-1&&$a[$b]<=0?$c:$i+3;}}'

Readable version:

#!/usr/bin/perl -n
push @prog, split /\s+/, $_;
if(eof) {
  $i = 0;
  while($i >= 0) {
    ($a, $b, $c) = @prog[$i .. $i + 2];
    if($b > -1) {
      $prog[$b] -= $prog[$a];
    } else {
      print chr $prog[$a];
    }
    if($b > -1 and $prog[$b] <= 0) {
      $i = $c;
    } else {
      $i + 3;
    }
  }
}

The above is an interpreter for pseudo-machine code for a One Instruction Set Computer using the subleq instruction. Basically, the source code should just be a bunch of numbers separated by whitespace. A simple test program to verify my results (should print "Hi" and a Unix newline):

0 0 6 72 105 10 3 -1 9 4 -1 12 5 -1 15 0 0 -1

Readable version of the test input (works just as well):

0    0     6
72   105   10
3   -1     9
4   -1     12
5   -1     15
0    0    -1
Chris Lutz
+2  A: 

My own entry, implementation of OISC in Ruby. It's 85 bytes long, and I'm sure one could squeeze a few more with some neat tricks. Programs have to contain data in code (line of space-separated numbers). At the moment I'm unable to provide working program written in OISC, but I will do it later.

p,m=0,gets.chomp.split.map(:to_i)
while p>0
p=(m[m[b]]-=m[m[a]])>0?p+3:c
end
$><<m[0]

The code is quite straightforward. m is a "memory" containing program and data. First line initializes m with provided code, and p - memory pointer. Main loop is subleq operation, written with ternary operator. Last line is smart way to output number contained in memory.

samuil
The established convention (as established a convention as you can have for such things) is to print the value contained in `a`, you would use the command `subleq a, -1`, or `a -1 x` in this syntax (where `x` is wherever you want to goto).
Chris Lutz
I always feel good when my fortran implementations are less than an order of magnitude larger than the really compact languages.
dmckee