views:

175

answers:

4

Suppose I would like to create a new programming language by only adding a new primitive datatype to C, say boolean. What would be needed to do this?

EDIT: I'm not making myself very clear here. I would like to design a new language, with the syntax being exactly like C, but with a bunch of new primitive datatypes. This language should output C code and I would then use GCC to compile the executables/object files.

A: 

If you want it to be widely supported, you need to convince ISO to include it in their next standard. As far as I can tell, there's really not much momentum for revising C at this point. All the interesting stuff is happening in C++, like C++0x.

Chris
Thanks, but that will be a DSL for researching certain algorithms... used by me, and maybe my students...
Dervin Thunk
The C1X standard is in progress. Before submitting the proposal to the C standard body, though, you need to demonstrate that it works by implementing it, and you'd be well advised to build a body of demand for the feature by having the implementation widely used. When standards bodies invent, they often make mistakes (trigraphs? `export` in C++?).
Jonathan Leffler
+2  A: 

I guess you could write a shell script, or a preprocessor between your new language and GCC to convert the small bits you add into normal C syntax. Think of it as a layer, just like GCC's preprocessor.

You could write the parser in any language, even C itself - anything that will take a text file in, change it, and write it out, either to another file or to stdout for GCC to read in and compile.

Hope this helps

James

JamWaffles
Yes, yes! This is a very good idea, actually. Simpler than what I had thought of.
Dervin Thunk
+1  A: 

Regarding your datatypes example: you can't just have any arbitrary datatype translated to C. Boolean you can, because it is simpler than existing types and can easily be represented by any integer type (as it is commonly #defined to anyway). But say you wanted something like a 256bit long integer, let's call it superlong. This superlong type can never be directly translated to C code as there is no equivalent datatype in C.

But if you just want to translate simple things into C, sounds like it would be a lot easier just the use the C Preprocessor.

jay.lee
downvoter care to explain? I'm just trying to give some constructive criticism.
jay.lee
You can indeed convert `superlong` into C; it will need to be treated as a structure or opaque type and the support will be in the form of library function calls.
Jonathan Leffler
True, and I had considered this, which is why I chose to use the words "directly translated" (there are probably more accurate words to use). Though I do suppose it is worth a mention.
jay.lee
In fact, that's much like what a C compiler has to do to implement floating point on architectures that don't natively support it.
caf
+5  A: 

In comments, you say that your proposed language (I'll call it Ext-C, for Extended C) is a DSL intended for a narrow audience (yourself, your students), then you will need to:

  • Write code to parse Ext-C, recognizing which parts are pure C and which parts are Ext-C.
  • Write the C code generator that represents the translation of Ext-C into C.
  • Assemble it into a pre-processor that reads Ext-C source files and writes C source files.
  • Write a compiler script (or program) that handles argument parsing and running the Ext-C pre-processor on the Ext-C files before running the C compiler for the rest of the translation.

Take a look at Cfront as one possible source of ideas.

Take a look at IBM Informix ESQL/C (available free from IBM as part of IBM Informix ClientSDK or CSDK); there is a script 'esql' which controls the compilation and a pre-processor 'esqlc' that actually parses the ESQL/C source and generates the corresponding C code. Programmers run the 'esql' script to compile ESQL/C programs; they don't run the 'esqlc' program manually unless they have an unusually perverse and masochistic streak.

Note that one of the trickier parts of any DSL is the integration with a debugger. You can arrange for your pre-processor to generate '#line' directives, which sometimes helps and sometimes hinders. (In my arsenal of scripts, I have one that comments out #line directives; I use it when I need to debug the intermediate C code but still manage to refer back to the original source code.) You can see how lex and yacc (and variants) handle this in their output, too. It is also a good idea to have the preprocessor clean up the intermediate file by default (on a successful compilation), but to provide an option so that the intermediate file is kept available for inspection. Note that if your preprocessor does not guarantee that all the output is valid C - because it just copies parts of the input to the output without validating it as a C compiler would - then you need to ensure that users (programmers) can tell where the source error is in the original file even if the error is spotted by the C compiler rather than the Ext-C compiler.

The IBM Informix 4GL programming language is a complete language that is wholly parsed by its primary compiler, which then (over-simplifying somewhat) generates C code. There is a script c4gl to control the compilation and linking, and a preprocessor/compiler i4glc1 (and, because I was over-simplifying, i4glc2, i4glc3 and i4glc4 too). If the generated C code fails to compile, it indicates a bug in the I4GL compiler - it is not the user's fault.

Jonathan Leffler
nice. this is also helpful.
Dervin Thunk