ansaurus

Question

writing a portable domain specific language

Answer 1

+2 A:

Ability to escape to the implementation language in the event you need to do something that just isn't supported by your DSL, or for performance reasons (though I realize that isn't a priority).

I am researching DSL for implementing rules in a rule engine in C#, some of the rules are really complex and may change significantly in the future, so being able to escape out to C# is really useful. Of course this breaks cross-platform compatibility, but it is really just a way of hacking around edge cases without having to change your DSL.

Dale Halliwell 2009-10-04 08:53:07

@Dale. Yes! There will be edge cases and there is no reason why they shouldn't be accreted in this way and then possibly abstracted and generalised later. I would expect that the major library functions would anyway be exposed so we would not have a single gateway.

peter.murray.rust 2009-10-04 08:56:33

Answer 2

A:

You'd be best off writing the library in C (or some language like rpython which will generate C-code) and then using SWIG or similar to generate the language specific bindings for C#, Java Python etc.

Note that this approach won't help if you are using Javascript in the browser - you'll have to write the javascript library separately. If you are using javascript through Rhino, then you'd be able to just use the Java bindings.

Ant 2009-10-04 08:54:58

Answer 3

+6 A:

There seems to be some ambiguity in the question between language and library. The terms "internal DSL" and "external DSL" are useful, and I think are due to Martin Fowler.

An "external" DSL might be a standalone command-line tool. It is passed a string of source, it parses it somehow, and does something with it. There are no real limits on how the syntax and semantics can work. It can also be made available as a library consisting mostly of an eval-like method; a common example would be building a SQL query as a string and calling an execute method in an RDBMS library; not a very pleasant or convenient usage pattern, and horrible if spread around a program on a large scale.

An "internal" DSL is a library that is written in such a way as to take advantage of the quirks of a host (general purpose) language to create the impression that a new language can be embedded inside an existing one. In syntactically-rich languages (C++, C#) this means using operator overloading in ways that seriously stretch (or ignore) the usual meanings of the operator symbols. There are many examples in C++; a few in C# also - the Irony parser toolkit simulates BNF in a fairly restrained way which works well.

Finally, there is a plain old library: classes, methods, properties, with well-chosen names.

An external DSL would allow you to completely ignore cross-language integration problems, as the only library-like portion would be an eval method. But inventing your own tool chain is non-trivial. People always forget the huge importance of debugging, intellisense, syntax highlighting etc.

An internal DSL is probably a pointless endeavour if you want to do it well on C# and Java. The problem is that if you take advantage of the quirks of one host language, you won't necessarily be able to repeat the trick on another language. e.g. Java has no operator overloading.

Which leaves a plain old library. If you want to span C# and Java (at least), then you are somewhat stuck in terms of a choice of implementation language. Do you really want to write the library twice? One possibility is to write the library in Java, and then use IKVM to cross-compile it to .NET assemblies. This would guarantee you an identical interface on both of those platforms.

On the downside, the API would be expressed in lowest-common-denominator features - which is to say, Java features :). No properties, just getX/setX methods. Steer clear of generics because the two systems are quite different in that respect. Also even the standard way of naming methods differs between the two (camelCase versus PascalCase), so one set of users would smell a rat.

Daniel Earwicker 2009-10-04 10:02:41

@Earwicker +1 very useful overview. The involvement of Java and C# is a given (unfortunate). I'll certainly try IKVM. I don't mind losing properties and the generics are simple. The naming convention can be automated, I hope.I am aware of the problems of inventing my own toolchain and this is a useful summary.

peter.murray.rust 2009-10-04 10:15:06

Yes, no doubt you could write a tool that would fix the naming convention. You might even be able to do something similar for properties - look for getX/setX method pairs (although perhaps an attribute marker would be necessary as not all getX methods are suitable for property syntax, e.g. sometimes they have visible side-effects).

Daniel Earwicker 2009-10-04 10:26:09

@Earwicker. This is my current idea. Reduce the calls to a common subset of as many languages as possible (e.g. do not use properties, certainly not operator overloading).

peter.murray.rust 2009-10-04 10:32:16

Answer 4

A:

It is possible to interpret JavaScript from inside a Java-program directly using the script engine, and apparently also from C#. Python can be run on the JVM and the .NET engine.

I would suggest that you investigate these options, and then write your library in a common subset of the execution paths available to the language you choose. I would not consider writing it in a language which requires post translation and conversion, since you introduce a step which can be very, very difficult to debug in case of problems.

Thorbjørn Ravn Andersen 2009-10-04 10:19:40

Answer 5

+1 A:

Although I do not want to promote my own project too much, I would like to mention PIL, a Platform Independent Language, an intermediate language I have been working on to enable the support of multiple software platforms (like Java, Python, ...), specifically for external DSLs. The general idea is that you generate code in PIL (a subset of Java), which the PIL compiler can then translate to one of many other languages, currently just Java or Python, but more will be added in the future.

I presented a paper about this on the Software and Language Engineering conference about 2 days ago, you can find a link to the publication of the PIL website (pil-lang.org), if you're interested.

Zef Hemel 2009-10-08 13:36:51

I've tried this sort of thing some years ago and it's useful to see people doing it rather better.

peter.murray.rust 2009-10-08 20:27:35

Answer 6

+3 A:

If you are willing to re-describe your language using ANTLR you could generate your DSL interpreter in multiple languages without having to manually maintain them including all of the languages you mentioned plus more.

Antlr is a parser/lexer generator and has a large number of target languages. This allows you to describe your language once, without having to maintain multiple copies of it.

See the whole list of target languages here.

Darien Ford 2009-10-08 14:02:47

I like this idea personally. Of course it's a big leap and difficult to retreat from

peter.murray.rust 2009-10-08 20:26:22

True, there is a learning curve with Antlr and StringTemplate. But personally, I love it. I am finding more and more issues can be solved with a simply described grammar. Ultimately the time you loose in the conversion process, you will more than make up for in maintenance savings.

Darien Ford 2009-10-09 18:32:32

Answer 7

A:

I would like to expand on Darien's answer. I think that ANTLR brings something to the table that few other lexer/parser tools provide (at least to my knowledge). If you would like to create a DSL which ultimately generates Java and C# code, ANTLR really shines.

ANTLR provides four fundamental components:

Lexer Grammar (break down input streams into tokens)
Parser Grammar (organize tokens into an abstract syntax tree)
Tree Grammar (walk the abstract syntax tree and pipe the metadata into a template engine)
StringTemplate (a template engine based on functional programming principles)

Your lexer,parser, and tree grammars can remain independent of your final generated language. In fact, the StringTemplate engine supports logical groups of template definitions. It even provides for interface inheritance of template groups. This means you can have third parties use your ANTLR parser to create say python, assembly, c, or ruby, when all you initially provided was java and C# output. The output language of your DSL can easily be extended as requirements change over time.

To get the most out of ANTLR you will want to read the following:

The Definitive ANTLR Reference: Building Domain-Specific Languages

Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages

Todd Stout 2009-10-15 02:23:18

ansaurus

tags:

views:

answers:

writing a portable domain specific language

related questions