views:

83

answers:

1

I am going to create a compiler for Direct3D's Shader Model language. The compiler's target platform and development environment are on Windows/VC++. For those who are not familiar with the Shader Model Language, here are examples of instructions which the language consists of (some of the instructions are a bit outdated, but the syntax is basically the same as the version I will be using).

Here

And here

I am considering flex/yacc as the framework for developing the compiler. Would these be suitable for the job? Is there any better framework for developing in native C++?

+2  A: 

In my opinion, a normal lexer and/or parser generator usually won't help much in writing an assembler. They're mostly helpful in dealing with relatively complex grammars, but in the case of an assembler, the "grammar" is usually so trivial that such a generator is more hindrance than help.

A typical assembler is mostly table driven -- you start by creating a table of defined op-codes, and the characteristics of the instruction it will generate (e.g. number and types of of registers that must be specified for it). You typically have a (smaller, in the case of shaders, probably much smaller) table defining how to encode addressing modes and such.

Most of the assembler works by consulting that table -- i.e. it reads something from input, and attempts to look it up in the table. If it's not present, it gives an error message saying it's an unknown opcode. If it's found, it gets information from the table about the number of operands associated with that op-code. It attempts to read that many operands. If it can't, it gives an error saying something's wrong with the instruction. If it can, it encodes the instruction, and starts over.

There are a few places it has to handle a bit more than that, of course. Where/when you define something like a label, it has to record the name and position of that label in a symbol table. When it encounters something like a branch to that address, it has to look up the target and encode its address appropriately.

Only when/if you decide to support macros do you depart much from that basic model. Depending on how elaborate you get with them, it might be worthwhile to use a parser generator and such for a macro expansion facility. Then again, given that shaders are mostly pretty small, macros aren't likely to be a very high priority for such an assembler.

Edit: rereading it, I should probably clarify/correct one point. The use for a parser generator isn't so much when the grammar itself becomes complex, as when the grammar allows for statements that are complex. Consider a really trivial grammar:

expression := expression '+' value
            | expression '-' value
            | value

Even though this allows only addition and subtraction, it still defines statements that are arbitrarily complex (or at least arbitrarily long strings of values being added or subtracted). Of course, for even a fairly trivial real language, we'll normally have multiplication, division, function calls, etc.

This is considerably different from a typical assembly language, where each each instruction has a fixed format. For example, an addition or subtraction operation has exactly two source operands and one destination operand.

Jerry Coffin
@Jerry Coffin: Thanks for the detailed answer. I just want to clarify one thing. The input language is not a textual description of the target machine's language, so an assembler will do not here. For example, the 'add' instruction in Shader Model does not necessarily map to an instruction in the target machine's instruction set. Before generating code, the tool will be doing some optimizations.The tool is a compiler.
zr
@zr:Okay, no problem with that. From a viewpoint of a parser, however, the question is whether the input *looks* essentially like an assembly language -- i.e. essentially fixed format input. What you do after that is really no concern of the parser at all. Granted, my answer assumed a fixed translation, but from a viewpoint of parser generation, it's completely irrelevant.
Jerry Coffin