In comments, you say that your proposed language (I'll call it Ext-C, for Extended C) is a DSL intended for a narrow audience (yourself, your students), then you will need to:
- Write code to parse Ext-C, recognizing which parts are pure C and which parts are Ext-C.
- Write the C code generator that represents the translation of Ext-C into C.
- Assemble it into a pre-processor that reads Ext-C source files and writes C source files.
- Write a compiler script (or program) that handles argument parsing and running the Ext-C pre-processor on the Ext-C files before running the C compiler for the rest of the translation.
Take a look at Cfront as one possible source of ideas.
Take a look at IBM Informix ESQL/C (available free from IBM as part of IBM Informix ClientSDK or CSDK); there is a script 'esql' which controls the compilation and a pre-processor 'esqlc' that actually parses the ESQL/C source and generates the corresponding C code. Programmers run the 'esql' script to compile ESQL/C programs; they don't run the 'esqlc' program manually unless they have an unusually perverse and masochistic streak.
Note that one of the trickier parts of any DSL is the integration with a debugger. You can arrange for your pre-processor to generate '#line
' directives, which sometimes helps and sometimes hinders. (In my arsenal of scripts, I have one that comments out #line
directives; I use it when I need to debug the intermediate C code but still manage to refer back to the original source code.) You can see how lex and yacc (and variants) handle this in their output, too. It is also a good idea to have the preprocessor clean up the intermediate file by default (on a successful compilation), but to provide an option so that the intermediate file is kept available for inspection. Note that if your preprocessor does not guarantee that all the output is valid C - because it just copies parts of the input to the output without validating it as a C compiler would - then you need to ensure that users (programmers) can tell where the source error is in the original file even if the error is spotted by the C compiler rather than the Ext-C compiler.
The IBM Informix 4GL programming language is a complete language that is wholly parsed by its primary compiler, which then (over-simplifying somewhat) generates C code. There is a script c4gl
to control the compilation and linking, and a preprocessor/compiler i4glc1
(and, because I was over-simplifying, i4glc2
, i4glc3
and i4glc4
too). If the generated C code fails to compile, it indicates a bug in the I4GL compiler - it is not the user's fault.