ansaurus

Question

Writing an z80 assembler - Lexing ASM and building a parse tree using composition?

Answer 1

+2 A:

It is not necessary to build a parse tree. Z80 op codes are very simple. They consist of the op code amd 0, 1 or 2 operands, separated by commas. You Just need to split the opcode up into the (maximum of 3) components with a very simple parser - no tree is needed.

anon 2009-08-20 09:54:31

What about relative memory addresses> surely some kind of symbolic tree is needed for this? perhaps even a table

Darknight 2009-08-20 10:21:44

The questioner was asking about parsing the OP codes. You certainly need a symbol table to assist in creating the assembler's actual machine code output, but that is a completely different issue.

anon 2009-08-20 10:23:50

@Darknight: Would there still not be a valid point to creating a parse tree. For example, other occurrences could play a part such as here are some ASM code strings that have variations in the syntax:Labels, Comments, Brackets, Global Variables, .DB, .DS, .DWWould these not also become more simplified if a parse tree was used?@Neil:Could you elaborate?Regards,GP

Gary Paluk 2009-08-20 12:59:12

@Gary You could do that if you want to complicate your life. I've written a couple of assemblers (though not one for the Z80) and they are normally easily implemented via a simple tokenising scheme (as I suggested in my answer) and an expression evaluator to evaluate address and other expressions. The expression evlauator may use a tree (though a stack based one is enough in my experience) but there is no need to make the parse tree your basic data structure.

anon 2009-08-20 13:14:14

@Neil Thanks for your feedback... You don't happen to have a code example do you? Would probably clarify alot! :)

Gary Paluk 2009-08-20 13:35:27

@Gary Sorry, the last assembler I wrote was in REXX on IBM's VM/CMS mainframe OS. Even if I had the code, I don't think it would be useful to you.

anon 2009-08-20 15:19:18

Answer 2

+4 A:

Well, the structure of the tree you really want for an instruction that operates on a register and an memory addressing mode involing an offset displacement and an index register would look like this:

    INSTRUCTION-----+
    |      |        |
  OPCODE  REG     OPERAND
                  |     |
                OFFSET  INDEXREG

And yes, you want want to pass values up and down the tree. A method for formally specifying such value passing is called "attribute grammars", and you decorate the grammar for your langauge (in your case, your assembler syntax) with the value-passing and the computations over those values. For more background, see Wikipedia on attribute grammars.

In a related question you asked, I discussed a tool, DMS, which handles expression grammars and building trees. As language manipulation tool, DMS faces exactly these same up-and-down the tree information flows issues. It shouldn't surprise you, that as a high-end language manipulation tool, it can handle attribute grammar computations directly.

Ira Baxter 2009-08-23 20:27:27

This was intended to be illustrative of the trees he needs to build, rather than matching a specific Z80 instrution.I coded a LOT of Z80 assembler back in the 1980s; I may know more about it than you think.

Ira Baxter 2009-08-23 20:36:32

Then demonstrate your knowledge. Have you written a Z80 assembler? I Haven't, but I have written 8080 and 6809 assemblers, and your advice seems to me to be dead wrong.

anon 2009-08-23 20:48:57

Your experience is probably different than mine. I've coded assemblers for the Collins 8311 (circa 1968), a 12 and 16 bit machines that didn't make it commercially so you havent heard of them, microcode assemblers, and the 6809 assembler you saw in the other message. I've also built a variety of compiler front ends(check out my bio information and website). My early assemblers were built by ad hoc means and worked. Stuff since the 80s I've built using lexers and parsers because it is far easier to specify, to maintain, to extend, you name it.

Ira Baxter 2009-08-23 21:07:42

My experience may be qualitatively different than yours along another axis. I've always wanted tools to help me specify/analyze/manipulate languages. I've spent the last 25 years figuring how to go far beyond lex and yacc, and now have tools that let me (and others if they wish) do so. I use them daily, and they are effective. Without experiencing such tools, you might not be convinced. OK, let many flowers bloom.

Ira Baxter 2009-08-23 21:21:14

Hmm. I'm totally in favour of tools, but some tools are fairly crappy. Along with Bjarne Stroustrup, I'm a big fan of recursive descent when it comes to parsing as opposed to table driven tools like yacc/bison. And when it comes to parsing assembler mnemonics, I'm an even bigger fan of ad hoc parsing. Assembler mnemonics are really not that regular. But you obviously do know what you are talking about, so I've deleted the initial comment (which was a bit rude) and removed the downvote.

anon 2009-08-23 21:50:04

When recursive descent works, it works great. But it only works well on modest sized grammars that don't change fast (when you build DSLs sometimes the grammar does change fast) and which don't require any lookahead or any type information to resolve the parse (your beloved C++ has a bad lookahead problem here). When you aren't in that window, more powerful (read "full context free") parsing engines are better, and explicit specifications of the full grammar make it a lot easier to revise and maintain. And once you have such machinery, its easy to kill gnats, too, and you should.

Ira Baxter 2009-08-23 22:34:47

anon 2009-08-23 22:51:04

DMS has a full C++ parser (even handles GNU and MS dialects) and uses a table driven scheme (called GLR, a generalization of LR) based off exactly the same grammar style as you see in the related question to this one. That C++ parser has been used to carry out massive automated transformations on large embedded software systems. The parser is as beautiful as the C++ reference manual will allow it to be :-} I speak as a front end builder.

Ira Baxter 2009-08-23 23:29:54

Answer 3

A:

Actually, the opcodes have not a byte base but an octal base. The best description i know is here: http://www.z80.info/decoding.htm

Steeve 2009-10-26 00:49:31

ansaurus

tags:

views:

answers:

Writing an z80 assembler - Lexing ASM and building a parse tree using composition?

related questions