views:

522

answers:

3

I'm currently trying to get to grips with BNF and attempting to assemble some Z80 ASM code. Since I'm new to both fields So my question is, Am I even on the right track? I am currently trying to write the format of Z80 ASM as EBNF so that I can then figure out where to go from there to create machine code from the source. At the moment I have the following:

Assignment = Identifier, ":" ;
Instruction = Opcode, [ Operand ], [ Operand ] ;
Operand = Identifier | Something* ;
Something* = "(" , Identifier, ")" ;
Identifier = Alpha, { Numeric | Alpha } ;
Opcode = Alpha, Alpha ;
Int = [ "-" ], Numeric, { Numeric } ;
Alpha = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z" ;
Numeric = "0" | "1" | "2" | "3"| "4" | "5" | "6" | "7" | "8" | "9" ;

Any directional feedback if I am going wrong would be excellent.

Many thanks!

+1  A: 

BNF is more generally used for structured, nested languages like Pascal, C++, or really anything derived from the Algol family (which includes modern languages like C#). If I were implementing an assembler, I might use some simple regular expressions to pattern-match the opcode and operands. It's been a while since I've used Z80 assembly language, but you might use something like:

/\s*(\w{2,3})\s+((\w+)(,\w+)?)?/

This would match any line which consists of a two- or three-letter opcode followed by one or two operands separated by a comma. After extracting an assembler line like this, you would look at the opcode and generate the correct bytes for the instruction, including the values of the operands if applicable.

The type of parser I've outlined above using regular expressions would be called an "ad hoc" parser, which essentially means you split and examine the input on some kind of block basis (in the case of assembly language, by text line).

Greg Hewgill
+1  A: 

I don't think you need overthink it. There's no point making a parser that takes apart “LD A,A” into a load operation, destination and source register, when you can just string match the whole thing (modulo case and whitespace) into one opcode directly.

There aren't that many opcodes, and they aren't arranged in such a way that you really get much benefit from parsing and understanding the assembler IMO. Obviously you'd need a parser for the byte/address/indexing arguments, but other than that I'd just have a one-to-one lookup.

bobince
Thanks for your feedback... I agree that going down a more simplistic path but also I'm interested in extending this out into a more complex language and figure I would like make use of the features from the beginning... Also there are differences in some other parts of ASM such as equ, .db, .ds, .dw, #include, () and then we start to get into simple case statements IF ELSE. Furthermore, this is also an excercise in learning the concepts of BNF and use it for this more simplistic implementation.
Gary Paluk
+5  A: 

Old-school assemblers were typically hand-coded in assembler and used adhoc parsing techniques to process assembly source lines to produce actual assembler code. When assembler syntax is simple (e.g. always OPCODE REG, OPERAND) this worked well enough.

Modern machines have messy, nasty instruction sets with lots of instruction variations and operands, which may be expressed with complex syntax allowing multiple index registers to participate in the operand expression. Allowing sophisticated assembly-time expressions with fixed and relocatable constants with various types of addition operators complicates this. Sophisticated assemblers allowing conditional compilation, macros, structured data declarations, etc. all add new demands on syntax. Processing all this syntax by ad hoc methods is very hard and is the reason that parser generators were invented.

Using a BNF and a parser generator is very reasonable way to build a modern assembler, even for a legacy processor such as the Z80. I have built such assemblers for Motorola 8 bit machines such as the 6800/6809, and am getting ready to do the same for a modern x86. I think you're headed down exactly the right path.

****** EDIT ************ The OP asked for example lexer and parser definitions. I've provided both here.

These are excerpts from real specifications for a 6809 asssembler. The complete definitions are 2-3x the size of the samples here.

To keep space down, I have edited out much of the dark-corner complexity which is the point of these definitions. One might be dismayed by the apparant complexity; the point is that with such definitions, you are trying to describe the shape of the language, not code it procedurally. You will pay a significantly higher complexity if you code all this in an ad hoc manner, and it will be far less maintainable.

It will also be of some help to know that these definitions are used with a high-end program analysis system that has lexing/parsing tools as subsystems, called the The DMS Software Reengineering Toolkit. DMS will automatically build ASTs from the
grammar rules in the parser specfication, which makes it a lot easier to buid parsing tools. Lastly, the parser specification contains so-called "prettyprinter" declarations, which allows DMS to regenreate source text from the ASTs. (The real purpose of the grammer was to allow us to build ASTs representing assembler instructions, and then spit them out to be fed to a real assembler!)

One thing of note: how lexemes and grammar rules are stated (the metasyntxax!) varies somewhat between different lexer/parser generator systems. The syntax of DMS-based specifications is no exception. DMS has relatively sophisticated grammar rules of its own, that really aren't practical to explain in the space available here. You'll have to live with idea that other systems use similar notations, for EBNF for rules and and regular expression variants for lexemes.

Given the OP's interests, he can implement similar lexer/parsers with any lexer/parser generator tool, e.g., FLEX/YACC, JAVACC, ANTLR, ...

****** LEXER **********

-- M6809.lex: Lexical Description for M6809
-- Copyright (C) 1989,1999-2002 Ira D. Baxter

%%
#mainmode Label

#macro digit "[0-9]"
#macro hexadecimaldigit "<digit>|[a-fA-F]"

#macro comment_body_character "[\u0009 \u0020-\u007E]" -- does not include NEWLINE

#macro blank "[\u0000 \ \u0009]"

#macro hblanks "<blank>+"

#macro newline "\u000d \u000a? \u000c? | \u000a \u000c?" -- form feed allowed only after newline

#macro bare_semicolon_comment "\; <comment_body_character>* "

#macro bare_asterisk_comment "\* <comment_body_character>* "

...[snip]

#macro hexadecimal_digit "<digit> | [a-fA-F]"

#macro binary_digit "[01]"

#macro squoted_character "\' [\u0021-\u007E]"

#macro string_character "[\u0009 \u0020-\u007E]"

%%Label -- (First mode) processes left hand side of line: labels, opcodes, etc.

#skip "(<blank>*<newline>)+"
#skip "(<blank>*<newline>)*<blank>+"
  << (GotoOpcodeField ?) >>

#precomment "<comment_line><newline>"

#preskip "(<blank>*<newline>)+"
#preskip "(<blank>*<newline>)*<blank>+"
  << (GotoOpcodeField ?) >>

-- Note that an apparant register name is accepted as a label in this mode
#token LABEL [STRING] "<identifier>"
  <<  (local (;; (= [TokenScan natural] 1) ; process all string characters
      (= [TokenLength natural] ?:TokenCharacterCount)=
      (= [TokenString (reference TokenBodyT)] (. ?:TokenCharacters))
      (= [Result (reference string)] (. ?:Lexeme:Literal:String:Value))
      [ThisCharacterCode natural]
      (define Ordinala #61)
      (define Ordinalf #66)
      (define OrdinalA #41)
      (define OrdinalF #46)
  );;
     (;; (= (@ Result) `') ; start with empty string
  (while (<= TokenScan TokenLength)
   (;;   (= ThisCharacterCode (coerce natural TokenString:TokenScan))  
  (+= TokenScan) ; bump past character
  (ifthen (>= ThisCharacterCode Ordinala)
     (-= ThisCharacterCode #20) ; fold to upper case
  )ifthen
  (= (@ Result) (append (@ Result) (coerce character ThisCharacterCode)))=

     );;
  )while
     );;
  )local
  (= ?:Lexeme:Literal:String:Format (LiteralFormat:MakeCompactStringLiteralFormat 0))  ; nothing interesting in string
  (GotoLabelList ?)
  >>

%%OpcodeField

#skip "<hblanks>"
  << (GotoEOLComment ?) >>
#ifnotoken
  << (GotoEOLComment ?) >>

-- Opcode field tokens
#token 'ABA'       "[aA][bB][aA]"
   << (GotoEOLComment ?) >>
#token 'ABX'       "[aA][bB][xX]"
   << (GotoEOLComment ?) >>
#token 'ADC'       "[aA][dD][cC]"
   << (GotoABregister ?) >>
#token 'ADCA'      "[aA][dD][cC][aA]"
   << (GotoOperand ?) >>
#token 'ADCB'      "[aA][dD][cC][bB]"
   << (GotoOperand ?) >>
#token 'ADCD'      "[aA][dD][cC][dD]"
   << (GotoOperand ?) >>
#token 'ADD'       "[aA][dD][dD]"
   << (GotoABregister ?) >>
#token 'ADDA'      "[aA][dD][dD][aA]"
   << (GotoOperand ?) >>
#token 'ADDB'      "[aA][dD][dD][bB]"
   << (GotoOperand ?) >>
#token 'ADDD'      "[aA][dD][dD][dD]"
   << (GotoOperand ?) >>
#token 'AND'       "[aA][nN][dD]"
   << (GotoABregister ?) >>
#token 'ANDA'      "[aA][nN][dD][aA]"
   << (GotoOperand ?) >>
#token 'ANDB'      "[aA][nN][dD][bB]"
   << (GotoOperand ?) >>
#token 'ANDCC'     "[aA][nN][dD][cC][cC]"
   << (GotoRegister ?) >>
...[long list of opcodes snipped]

#token IDENTIFIER [STRING] "<identifier>"
  <<  (local (;; (= [TokenScan natural] 1) ; process all string characters
      (= [TokenLength natural] ?:TokenCharacterCount)=
      (= [TokenString (reference TokenBodyT)] (. ?:TokenCharacters))
      (= [Result (reference string)] (. ?:Lexeme:Literal:String:Value))
      [ThisCharacterCode natural]
      (define Ordinala #61)
      (define Ordinalf #66)
      (define OrdinalA #41)
      (define OrdinalF #46)
  );;
     (;; (= (@ Result) `') ; start with empty string
  (while (<= TokenScan TokenLength)
   (;;   (= ThisCharacterCode (coerce natural TokenString:TokenScan))  
  (+= TokenScan) ; bump past character
  (ifthen (>= ThisCharacterCode Ordinala)
     (-= ThisCharacterCode #20) ; fold to upper case
  )ifthen
  (= (@ Result) (append (@ Result) (coerce character ThisCharacterCode)))=

     );;
  )while
     );;
  )local
  (= ?:Lexeme:Literal:String:Format (LiteralFormat:MakeCompactStringLiteralFormat 0))  ; nothing interesting in string
  (GotoOperandField ?)
  >>

#token '#'   "\#" -- special constant introduction (FDB)
   << (GotoDataField ?) >>

#token NUMBER [NATURAL] "<decimal_number>"
  << (local [format LiteralFormat:NaturalLiteralFormat]
    (;; (= ?:Lexeme:Literal:Natural:Value (ConvertDecimalTokenStringToNatural (. format) ? 0 0))
 (= ?:Lexeme:Literal:Natural:Format (LiteralFormat:MakeCompactNaturalLiteralFormat format))
    );;
 )local
 (GotoOperandField ?)
  >>

#token NUMBER [NATURAL] "\$ <hexadecimal_digit>+"
  << (local [format LiteralFormat:NaturalLiteralFormat]
    (;; (= ?:Lexeme:Literal:Natural:Value (ConvertHexadecimalTokenStringToNatural (. format) ? 1 0))
 (= ?:Lexeme:Literal:Natural:Format (LiteralFormat:MakeCompactNaturalLiteralFormat format))
    );;
 )local
 (GotoOperandField ?)
  >>

#token NUMBER [NATURAL] "\% <binary_digit>+"
  << (local [format LiteralFormat:NaturalLiteralFormat]
    (;; (= ?:Lexeme:Literal:Natural:Value (ConvertBinaryTokenStringToNatural (. format) ? 1 0))
 (= ?:Lexeme:Literal:Natural:Format (LiteralFormat:MakeCompactNaturalLiteralFormat format))
    );;
 )local
 (GotoOperandField ?)
  >>

#token CHARACTER [CHARACTER] "<squoted_character>"
  <<  (= ?:Lexeme:Literal:Character:Value (TokenStringCharacter ? 2))
  (= ?:Lexeme:Literal:Character:Format (LiteralFormat:MakeCompactCharacterLiteralFormat 0 0)) ; nothing special about character
  (GotoOperandField ?)
  >>


%%OperandField

#skip "<hblanks>"
  << (GotoEOLComment ?) >>
#ifnotoken
  << (GotoEOLComment ?) >>

-- Tokens signalling switch to index register modes
#token ','   "\,"
   <<(GotoRegisterField ?)>>
#token '['   "\["
   <<(GotoRegisterField ?)>>

-- Operators for arithmetic syntax
#token '!!'  "\!\!"
#token '!'   "\!"
#token '##'  "\#\#"
#token '#'   "\#"
#token '&'   "\&"
#token '('   "\("
#token ')'   "\)"
#token '*'   "\*"
#token '+'   "\+"
#token '-'   "\-"
#token '/'   "\/"
#token '//'   "\/\/"
#token '<'   "\<"
#token '<'   "\<" 
#token '<<'  "\<\<"
#token '<='  "\<\="
#token '</'  "\<\/"
#token '='   "\="
#token '>'   "\>"
#token '>'   "\>"
#token '>='  "\>\="
#token '>>'  "\>\>"
#token '>/'  "\>\/"
#token '\\'  "\\"
#token '|'   "\|"
#token '||'  "\|\|"

#token NUMBER [NATURAL] "<decimal_number>"
  << (local [format LiteralFormat:NaturalLiteralFormat]
    (;; (= ?:Lexeme:Literal:Natural:Value (ConvertDecimalTokenStringToNatural (. format) ? 0 0))
 (= ?:Lexeme:Literal:Natural:Format (LiteralFormat:MakeCompactNaturalLiteralFormat format))
    );;
 )local
  >>

#token NUMBER [NATURAL] "\$ <hexadecimal_digit>+"
  << (local [format LiteralFormat:NaturalLiteralFormat]
    (;; (= ?:Lexeme:Literal:Natural:Value (ConvertHexadecimalTokenStringToNatural (. format) ? 1 0))
 (= ?:Lexeme:Literal:Natural:Format (LiteralFormat:MakeCompactNaturalLiteralFormat format))
    );;
 )local
  >>

#token NUMBER [NATURAL] "\% <binary_digit>+"
  << (local [format LiteralFormat:NaturalLiteralFormat]
    (;; (= ?:Lexeme:Literal:Natural:Value (ConvertBinaryTokenStringToNatural (. format) ? 1 0))
 (= ?:Lexeme:Literal:Natural:Format (LiteralFormat:MakeCompactNaturalLiteralFormat format))
    );;
 )local
  >>

-- Notice that an apparent register is accepted as a label in this mode
#token IDENTIFIER [STRING] "<identifier>"
  <<  (local (;; (= [TokenScan natural] 1) ; process all string characters
      (= [TokenLength natural] ?:TokenCharacterCount)=
      (= [TokenString (reference TokenBodyT)] (. ?:TokenCharacters))
      (= [Result (reference string)] (. ?:Lexeme:Literal:String:Value))
      [ThisCharacterCode natural]
      (define Ordinala #61)
      (define Ordinalf #66)
      (define OrdinalA #41)
      (define OrdinalF #46)
  );;
     (;; (= (@ Result) `') ; start with empty string
  (while (<= TokenScan TokenLength)
   (;;   (= ThisCharacterCode (coerce natural TokenString:TokenScan))  
  (+= TokenScan) ; bump past character
  (ifthen (>= ThisCharacterCode Ordinala)
     (-= ThisCharacterCode #20) ; fold to upper case
  )ifthen
  (= (@ Result) (append (@ Result) (coerce character ThisCharacterCode)))=

     );;
  )while
     );;
  )local
  (= ?:Lexeme:Literal:String:Format (LiteralFormat:MakeCompactStringLiteralFormat 0))  ; nothing interesting in string
  >>

%%Register -- operand field for TFR, ANDCC, ORCC, EXG opcodes

#skip "<hblanks>"
#ifnotoken << (GotoRegisterField ?) >>

%%RegisterField -- handles registers and indexing mode syntax
-- In this mode, names that look like registers are recognized as registers

#skip "<hblanks>"
  << (GotoEOLComment ?) >>
#ifnotoken
  << (GotoEOLComment ?) >>

#token '['   "\["
#token ']'   "\]"
#token '--'  "\-\-"
#token '++'  "\+\+"

#token 'A'      "[aA]"
#token 'B'      "[bB]"
#token 'CC'     "[cC][cC]"
#token 'DP'     "[dD][pP] | [dD][pP][rR]" -- DPR shouldnt be needed, but found one instance
#token 'D'      "[dD]"
#token 'Z'      "[zZ]"

-- Index register designations
#token 'X'      "[xX]"
#token 'Y'      "[yY]"
#token 'U'      "[uU]"
#token 'S'      "[sS]"
#token 'PCR'    "[pP][cC][rR]"
#token 'PC'     "[pP][cC]"

#token ','    "\,"

-- Operators for arithmetic syntax
#token '!!'  "\!\!"
#token '!'   "\!"
#token '##'  "\#\#"
#token '#'   "\#"
#token '&'   "\&"
#token '('   "\("
#token ')'   "\)"
#token '*'   "\*"
#token '+'   "\+"
#token '-'   "\-"
#token '/'   "\/"
#token '<'   "\<"
#token '<'   "\<" 
#token '<<'  "\<\<"
#token '<='  "\<\="
#token '<|'  "\<\|"
#token '='   "\="
#token '>'   "\>"
#token '>'   "\>"
#token '>='  "\>\="
#token '>>'  "\>\>"
#token '>|'  "\>\|"
#token '\\'  "\\"
#token '|'   "\|"
#token '||'  "\|\|"

#token NUMBER [NATURAL] "<decimal_number>"
  << (local [format LiteralFormat:NaturalLiteralFormat]
    (;; (= ?:Lexeme:Literal:Natural:Value (ConvertDecimalTokenStringToNatural (. format) ? 0 0))
 (= ?:Lexeme:Literal:Natural:Format (LiteralFormat:MakeCompactNaturalLiteralFormat format))
    );;
 )local
  >>

... [snip]

%% -- end M6809.lex

************ PARSER **********

-- M6809.ATG: Motorola 6809 assembly code parser
-- (C) Copyright 1989;1999-2002 Ira D. Baxter; All Rights Reserved

m6809 = sourcelines ;

sourcelines = ;
sourcelines = sourcelines sourceline EOL ;
  <<PrettyPrinter>>: { V(CV(sourcelines[1]),H(sourceline,A<eol>(EOL))); }

-- leading opcode field symbol should be treated as keyword.

sourceline = ;
sourceline = labels ;
sourceline = optional_labels 'EQU' expression ;
  <<PrettyPrinter>>: { H(optional_labels,A<opcode>('EQU'),A<operand>(expression)); }
sourceline = LABEL 'SET' expression ;
  <<PrettyPrinter>>: { H(A<firstlabel>(LABEL),A<opcode>('SET'),A<operand>(expression)); }
sourceline = optional_label instruction ;
  <<PrettyPrinter>>: { H(optional_label,instruction); }
sourceline = optional_label optlabelleddirective ;
  <<PrettyPrinter>>: { H(optional_label,optlabelleddirective); }
sourceline = optional_label implicitdatadirective ;
  <<PrettyPrinter>>: { H(optional_label,implicitdatadirective); }
sourceline = unlabelleddirective ;
sourceline = '?ERROR' ;
  <<PrettyPrinter>>: { A<opcode>('?ERROR'); }

optional_label = labels ;
optional_label = LABEL ':' ;
  <<PrettyPrinter>>: { H(A<firstlabel>(LABEL),':'); }
optional_label = ;

optional_labels = ;
optional_labels = labels ;
labels = LABEL ;
  <<PrettyPrinter>>: { A<firstlabel>(LABEL); }
labels = labels ',' LABEL ;
  <<PrettyPrinter>>: { H(labels[1],',',A<otherlabels>(LABEL)); }

unlabelleddirective = 'END' ;
  <<PrettyPrinter>>: { A<opcode>('END'); }
unlabelleddirective = 'END' expression ;
  <<PrettyPrinter>>: { H(A<opcode>('END'),A<operand>(expression)); }
unlabelleddirective = 'IF' expression EOL conditional ;
  <<PrettyPrinter>>: { V(H(A<opcode>('IF'),H(A<operand>(expression),A<eol>(EOL))),CV(conditional)); }
unlabelleddirective = 'IFDEF' IDENTIFIER EOL conditional ;
  <<PrettyPrinter>>: { V(H(A<opcode>('IFDEF'),H(A<operand>(IDENTIFIER),A<eol>(EOL))),CV(conditional)); }
unlabelleddirective = 'IFUND' IDENTIFIER EOL conditional ;
  <<PrettyPrinter>>: { V(H(A<opcode>('IFUND'),H(A<operand>(IDENTIFIER),A<eol>(EOL))),CV(conditional)); }
unlabelleddirective = 'INCLUDE' FILENAME ;
  <<PrettyPrinter>>: { H(A<opcode>('INCLUDE'),A<operand>(FILENAME)); }
unlabelleddirective = 'LIST' expression ;
  <<PrettyPrinter>>: { H(A<opcode>('LIST'),A<operand>(expression)); }
unlabelleddirective = 'NAME' IDENTIFIER ;
  <<PrettyPrinter>>: { H(A<opcode>('NAME'),A<operand>(IDENTIFIER)); }
unlabelleddirective = 'ORG' expression ;
  <<PrettyPrinter>>: { H(A<opcode>('ORG'),A<operand>(expression)); }
unlabelleddirective = 'PAGE' ;
  <<PrettyPrinter>>: { A<opcode>('PAGE'); }
unlabelleddirective = 'PAGE' HEADING ;
  <<PrettyPrinter>>: { H(A<opcode>('PAGE'),A<operand>(HEADING)); }
unlabelleddirective = 'PCA' expression ;
  <<PrettyPrinter>>: { H(A<opcode>('PCA'),A<operand>(expression)); }
unlabelleddirective = 'PCC' expression ;
  <<PrettyPrinter>>: { H(A<opcode>('PCC'),A<operand>(expression)); }
unlabelleddirective = 'PSR' expression ;
  <<PrettyPrinter>>: { H(A<opcode>('PSR'),A<operand>(expression)); }
unlabelleddirective = 'TABS' numberlist ;
  <<PrettyPrinter>>: { H(A<opcode>('TABS'),A<operand>(numberlist)); }
unlabelleddirective = 'TITLE' HEADING ;
  <<PrettyPrinter>>: { H(A<opcode>('TITLE'),A<operand>(HEADING)); }
unlabelleddirective = 'WITH' settings ;
  <<PrettyPrinter>>: { H(A<opcode>('WITH'),A<operand>(settings)); }

settings = setting ;
settings = settings ',' setting ;
  <<PrettyPrinter>>: { H*; }
setting = 'WI' '=' NUMBER ;
  <<PrettyPrinter>>: { H*; }
setting = 'DE' '=' NUMBER ;
  <<PrettyPrinter>>: { H*; }
setting = 'M6800' ;
setting = 'M6801' ;
setting = 'M6809' ;
setting = 'M6811' ;

-- collects lines of conditional code into blocks
conditional = 'ELSEIF' expression EOL conditional ;
  <<PrettyPrinter>>: { V(H(A<opcode>('ELSEIF'),H(A<operand>(expression),A<eol>(EOL))),CV(conditional[1])); }
conditional = 'ELSE' EOL else ;
  <<PrettyPrinter>>: { V(H(A<opcode>('ELSE'),A<eol>(EOL)),CV(else)); }
conditional = 'FIN' ;
  <<PrettyPrinter>>: { A<opcode>('FIN'); }
conditional = sourceline EOL conditional ;
  <<PrettyPrinter>>: { V(H(sourceline,A<eol>(EOL)),CV(conditional[1])); }

else = 'FIN' ;
  <<PrettyPrinter>>: { A<opcode>('FIN'); }
else = sourceline EOL else ;
  <<PrettyPrinter>>: { V(H(sourceline,A<eol>(EOL)),CV(else[1])); }

-- keyword-less directive, generates data tables

implicitdatadirective = implicitdatadirective ',' implicitdataitem ;
  <<PrettyPrinter>>: { H*; }
implicitdatadirective = implicitdataitem ;

implicitdataitem = '#' expression ;
  <<PrettyPrinter>>: { A<operand>(H('#',expression)); }
implicitdataitem = '+' expression ;
  <<PrettyPrinter>>: { A<operand>(H('+',expression)); }
implicitdataitem = '-' expression ;
  <<PrettyPrinter>>: { A<operand>(H('-',expression)); }
implicitdataitem = expression ;
  <<PrettyPrinter>>: { A<operand>(expression); }
implicitdataitem = STRING ;
  <<PrettyPrinter>>: { A<operand>(STRING); }

-- instructions valid for m680C (see Software Dynamics ASM manual)
instruction = 'ABA' ;
  <<PrettyPrinter>>: { A<opcode>('ABA'); }
instruction = 'ABX' ;
  <<PrettyPrinter>>: { A<opcode>('ABX'); }

instruction = 'ADC' 'A' operandfetch ;
  <<PrettyPrinter>>: { H(A<opcode>(H('ADC','A')),A<operand>(operandfetch)); }
instruction = 'ADC' 'B' operandfetch ;
  <<PrettyPrinter>>: { H(A<opcode>(H('ADC','B')),A<operand>(operandfetch)); }
instruction = 'ADCA' operandfetch ;
  <<PrettyPrinter>>: { H(A<opcode>('ADCA'),A<operand>(operandfetch)); }
instruction = 'ADCB' operandfetch ;
  <<PrettyPrinter>>: { H(A<opcode>('ADCB'),A<operand>(operandfetch)); }
instruction = 'ADCD' operandfetch ;
  <<PrettyPrinter>>: { H(A<opcode>('ADCD'),A<operand>(operandfetch)); }

instruction = 'ADD' 'A' operandfetch ;
  <<PrettyPrinter>>: { H(A<opcode>(H('ADD','A')),A<operand>(operandfetch)); }
instruction = 'ADD' 'B' operandfetch ;
  <<PrettyPrinter>>: { H(A<opcode>(H('ADD','B')),A<operand>(operandfetch)); }
instruction = 'ADDA' operandfetch ;
  <<PrettyPrinter>>: { H(A<opcode>('ADDA'),A<operand>(operandfetch)); }

[..snip...]

-- condition code mask for ANDCC and ORCC
conditionmask = '#' expression ;
  <<PrettyPrinter>>: { H*; }
conditionmask = expression ;

target = expression ;

operandfetch = '#' expression ; --immediate
  <<PrettyPrinter>>: { H*; }

operandfetch = memoryreference ;

operandstore = memoryreference ;

memoryreference = '[' indexedreference ']' ;
  <<PrettyPrinter>>: { H*; }
memoryreference = indexedreference ;

indexedreference = offset ;
indexedreference = offset ',' indexregister ;
  <<PrettyPrinter>>: { H*; }
indexedreference = ',' indexregister ;
  <<PrettyPrinter>>: { H*; }
indexedreference = ',' '--' indexregister ;
  <<PrettyPrinter>>: { H*; }
indexedreference = ',' '-' indexregister ;
  <<PrettyPrinter>>: { H*; }
indexedreference = ',' indexregister '++' ;
  <<PrettyPrinter>>: { H*; }
indexedreference = ',' indexregister '+' ;
  <<PrettyPrinter>>: { H*; }

offset = '>' expression ; -- page zero ref
  <<PrettyPrinter>>: { H*; }
offset = '<' expression ; -- long reference
  <<PrettyPrinter>>: { H*; }
offset = expression ;
offset = 'A' ;
offset = 'B' ;
offset = 'D' ;

registerlist = registername ;
registerlist = registerlist ',' registername ;
  <<PrettyPrinter>>: { H*; }

registername = 'A' ;
registername = 'B' ;
registername = 'CC' ;
registername = 'DP' ;
registername = 'D' ;
registername = 'Z' ;
registername = indexregister ;

indexregister = 'X' ;
indexregister = 'Y' ;
indexregister = 'U' ;  -- not legal on M6811
indexregister = 'S' ;
indexregister = 'PCR' ;
indexregister = 'PC' ;

expression = sum '=' sum ;
  <<PrettyPrinter>>: { H*; }
expression = sum '<<' sum ;
  <<PrettyPrinter>>: { H*; }
expression = sum '</' sum ;
  <<PrettyPrinter>>: { H*; }
expression = sum '<=' sum ;
  <<PrettyPrinter>>: { H*; }
expression = sum '<' sum ;
  <<PrettyPrinter>>: { H*; }
expression = sum '>>' sum ;
  <<PrettyPrinter>>: { H*; }
expression = sum '>/' sum ;
  <<PrettyPrinter>>: { H*; }
expression = sum '>=' sum ;
  <<PrettyPrinter>>: { H*; }
expression = sum '>' sum ;
  <<PrettyPrinter>>: { H*; }
expression = sum '#' sum ;
  <<PrettyPrinter>>: { H*; }
expression = sum ;

sum = product ;
sum = sum '+' product ;
  <<PrettyPrinter>>: { H*; }
sum = sum '-' product ;
  <<PrettyPrinter>>: { H*; }
sum = sum '!' product ;
  <<PrettyPrinter>>: { H*; }
sum = sum '!!' product ;
  <<PrettyPrinter>>: { H*; }

product = term '*' product ;
  <<PrettyPrinter>>: { H*; }
product = term '||' product ; -- wrong?
  <<PrettyPrinter>>: { H*; }
product = term '/' product ;
  <<PrettyPrinter>>: { H*; }
product = term '//' product ;
  <<PrettyPrinter>>: { H*; }
product = term '&' product ;
  <<PrettyPrinter>>: { H*; }
product = term '##' product ;
  <<PrettyPrinter>>: { H*; }
product = term ;

term = '+' term ;
  <<PrettyPrinter>>: { H*; }
term = '-' term ; 
  <<PrettyPrinter>>: { H*; }
term = '\\' term ; -- complement
  <<PrettyPrinter>>: { H*; }
term = '&' term ; -- not

term = IDENTIFIER ;
term = NUMBER ;
term = CHARACTER ;
term = '*' ;
term = '(' expression ')' ;
  <<PrettyPrinter>>: { H*; }

numberlist = NUMBER ;
numberlist = numberlist ',' NUMBER ;
  <<PrettyPrinter>>: { H*; }
Ira Baxter
Ira, thank you for good feedback(I tried to rank this up but I'm not experienced enough lol). I wondered if you had a lexer and/or parser example which uses this approach so that I might be able to take a look or if not, any chance of some pseudo code to get me moving forward with this :). Best Regards
Gary Paluk
@Gary: Beware what you ask for :} See edits to my small answer turning it into a giant one.
Ira Baxter
@Gary: PS, to "rank this up", simply click the upward facing triangle above the score next to the start of the answer. :-}
Ira Baxter
This is frickin awesome...gonna go and read it thoroughly! Thanks Sadly, I don't have a high enough personal ranking to allow me to rank this up!! Could someone please do it for me! :D
Gary Paluk
OK...Starting with the lexer, I know your code is doing what I want to acheive, I have no idea how to represent it in my target language ECMA Script.Staring at the very begining I come stuck nearly instantly with how to represent the macros, I wrote some methods to do checks such as: // Hexidecimal Digit public function hexidecimalDigit( str: String ): Boolean { return decimalDigit( str ) || "A" || "B" || "C" || "D" || "E" || "F"; }or checks but once I get to multiple character inputs it goes terribly wrong lol...public function opCode( str: String ): Boolean { //???? }GP
Gary Paluk
You need to read about lexer and parser generators, and how they accept lexical definitions and grammer defintions, and how they generate code to implement those definitions. Read any book on "Flex" and "YACC" to get this background. Since you want to work in Javacript, I have the perfect introduction for you:<a href="http://www.bayfronttechnologies.com/mc_tutorial.html">A tutorial on building compilers (in JavaScript)</a>. This tutorial will distract you for a month of evenings, and worth every moment. It is based on a *1963* paper, and its how I learned to build compilers originally.
Ira Baxter
SOmehow I bungled the link in previous note, and I don't know how to edit comments, so here' the link again:<a href="http://www.bayfronttechnologies.com/mc_tutorial.html>Tutorial on building compilers (using Javascript)</a>. It uses different notation than I used, but the ideas are very similar.
Ira Baxter
Why can't I get this hyperlink right in a note? Seem StackOverflow is doing something to the link text. OK, here's the link as rawtext: http://www.bayfronttechnologies.com/mc_tutorial.htmlSorry for the fumbles.
Ira Baxter