views:

53

answers:

2

I've been thinking about doing my own language (practicality: it's a thought experiment). One of the ideas I came up with is in-language semantic variation. You'd write essentially semantic regular expressions, to be replaced with equivalent code. You can see this in a somewhat less direct form in D- they have string mixins that convert to D code. Except I was going to do them implicitly, and in a more circular fashion.

Right now, I originate from C++. So if you consider:

string a, b, c, d;
// do stuff
a = b + c + d;

This code results in various temporaries. Even if you have rvalue references, you will create temporaries, they will simply be re-used more efficiently. But they still exist and still waste performance. I was thinking about, in the most simple case, of how these could be eliminated. You could write a semantic regular expression that would convert it into the most optimized form.

string a, b, c, d;
// do stuff
a.resize(b.size() + c.size() + d.size());
a = b; a += c; a += d;

If I implemented std::string, I might be able to write something even faster. The key to this is that they're implicit - when you use the std::string class, the axioms written by the std::string implementer can affect any std::string code. You could just drop it in to an existing C++ codebase, recompile, and get the fastest string concatenation that your std::string implementer can conceive of for free.

At the moment, the optimizations you can make are limited, because you only have as much context as the language allows you, in this case, operator overloading in C++ only taking two arguments, this and arg. But a semantic reg ex could take virtually all the context you could ever need - since you can dictate what it matches - and even match to language features that don't exist in the host language. For example, it would be trivial to exchange

string a;
a.size;

for

string a;
a.size();

if you wanted to steal C# properties. You could match class definitions and implement compile or run time reflection, etc.

But, I mean, it could get confusing. If there was a bug, or what was really done behind the scenes didn't reflect the code that was written, it could be a total bitch to track down, and I've not considered how it would be implemented in depth. What do you guys think of my proposed language feature?

Oh man, choosing the right tags. Ummm....

Edit: I also wanted to breach the scope of limits, as regards to one answer I had. The simple fact is that semantic regex has no limits (minus implementation details that may have to be added). For example, you could turn the expression

int i;
cin >> i;
int lols[i];

into

int i;
cin >> i;
std::variable_array<int>(alloca(sizeof(int) * i), i);

The semantics of alloca make manipulation with templates impossible- you have to write a macro if you want the above. In C++03 or C++0x, you cannot encapsulate your own VLAs.

In addition, semantic regexes can match code that doesn't actually invoke any compile-time work. For example, you could match every member of a class definition and use it to create a reflection system. This is also impossible in C++ to date.

+1  A: 

If you Google for something like "C++ expression template", you'll find that in reality C++ already has pretty similar capabilities. Depending on what you came up with for syntax, your idea might make such code easier to understand (expression templates certainly aren't trivial) but at least to me it's not entirely clear that you're adding much (if anything) in the way of truly new capability.

Jerry Coffin
Are you referring to the way in which, for example, boost::lambda work? Also, I've seen things like that library that computes the areas of shapes drawn in operators at compile-time. But it's not the same thing, because firstly I can define my interface to be whatever I like (within the bounds of reason), whereas expression template interfaces are most definitely non-trivial. Secondly, I'm not talking about doing runtime behaviour at compiletime, or even vice versa, but replacing one compile/runtime behaviour with another of the same type that is preferable in some other wasy.
DeadMG
no more edits :( Thirdly, there are far fewer limitations. For example, a semantic reg ex has the power to match constructs that have no compile-time work associated with them, like a class definition. How would you match a class definition with a template? Or encapsulate a VLA? You can't do either of those things.
DeadMG
A: 

(Warning: Mammoth answer ahead!)

I think it's called a macro ;) Well, at least outside the C/C++ world (where "macro" refers to this severly limited substitution the preprocessor provides). And it's not very novel. Though I think a proper, powerful macro system can add more power to a language than any other feature (given we preserve enough primitives that it's not merely turing-completene, but useful for real programming), in that a sufficently smart programmer can add nearly all features that might prove useful in the future or for a specific domain without adding (further) rules to the language.

The basic idea is to parse a program into a representation above a string with the source code, say, an AST or a parse tree. Such trees provide more information about the program, and another program can walk this tree and modify it. For example, it would be possible to look for a VariableDeclaration node, check if it declares a plain old array of T, and replace it with a new VariableDeclaration node that instead declares a std::variable_array of T. This can for example be refined by providing pattern matching for the tree, making metaprogramming easier. A powerful procedure, if and only if the programmer can cope with this level of abstractness and knows how to put it to good use.

Note that when I'm speaking of "pattern matching", I speak of the pattern matching in functional programming, not of regular expressions. Regular expressions are insufficent to make sense of irregular languages, this includes about every useful language - merely allowing expressions of abritary size, including balanced parentheses, rules regular expressions out. See the accepted answer on What is 'Pattern Matching' in functional languages? for an excellent introduction to pattern matching, and maybe learn a functional language like Haskell oder O'Caml if only to learn how to use it and how to process trees (and there's a ton of other cool features!).

Now on the language you propose: Honestly, I doubt it would be useful. C++ itself is a perfect example of how not to design a language (unless you want to successful): Take an existing one, stay backward-compatible = keep all of it (including the bad stuff), and add a bunch of new features that are complex enough by themselves, then tweak them a thousand times and add a hundred special cases to work more-or-less with the syntax and semantics of the existing language. It makes success more likely (if the language you started with is already popular), but you end up with an arcane and inelegant beast. That being said, I'd really love to see a non-lisp language that allows macros of such power.

The right (or at least, a better) way would be rethinking every single bit, from the most basics semantics to the exact syntax, integrate it with what you want to add, and tweak all parts of the newly formed language until the whole picture looks right. In your case, this would have an extremely convenient side effect: Ease of parsing. Of course, the source must be parsed before macros can be applied, as they concern themselfes with a tree, not with string fragments. But C++ is very hard to parse. Like, literally the hardest-to-parse language in common use.

Oh, while we're at it: Macros themselves can make the life of our beloved tools (IDEs with autocomplete and call tips, static code analysis, etc pp) miserable. Making sense of a piece of code is hard enough, but it gets even worse if this code will be transformed abritarily, and possibly very heavily, before it reaches the form that is executed. In general, code analysis tools can't cope with macros. The whole area is so hard that clever people make up new languages for research on it and write papers on it neither of us can comprehend. So be aware that macros do have downsides.

delnan