views:

725

answers:

7

I'd like to know good strategies for deploying a domain-specific-language which must run under at least 2 languages (Java, C#) and probably more (Python, and possibly Javascript).

Some background. We have developed and deployed a domain-specific language currently written in C#. It's deployed though a series of method calls whose arguments are either common language primitives (string, double, etc.), Collections (IEnumerable, HashSet, ...) or objects in a domain-specific library (CMLMolecule, Point3, RealSquareMatrix). The library is well tested and the objects have to comply to a stable deployed XML schema so change will be evolutionary and managed (at least that's the hope).

We hope the language will become used by a wide and partially computer-literate community, used to hacking their own solutions without central control. Ideally the DSL will create a degree of encapsulation and produce the essential functionality they need. The libraries will manage the detailed algorithms which are many and varied but fairly well known. There's a lot in common with the requirements of the DSL in http://stackoverflow.com/questions/1485006/domain-specific-languages-vs-library-of-functions.

I'd appreciate ideas on the best architecture (clearly once it's deployed we cannot easily backtrack). The choices include at least:

  • Creation of an IDL (e.g. through CORBA). The W3C did this for the XML DOM - I hated it - and it seems to be overkill
  • manual creation of similar signatures for each platform and best endeavour to keep them in sync.
  • Creation of a parsable language (e.g. CSS).
  • declarative programming in XML (c.f. XSLT). This is my preferred solution as it can be searched, manipulated, etc.

Performance is not important. Clarity of purpose is.

EDIT There was discussion as to whether application calls contitute a DSL. I have discovered Martin Fowler's introduction to DSLs (http://martinfowler.com/dslwip/Intro.html) where he argues that simple method calls (or chained calls) can be called a DSL. So a series like:

point0 = line0.intersectWith(plane);
point1 = line1.intersectWith(plane);
midpoint = point0.midpoint(point1);

could be considered a DSL

+2  A: 

Ability to escape to the implementation language in the event you need to do something that just isn't supported by your DSL, or for performance reasons (though I realize that isn't a priority).

I am researching DSL for implementing rules in a rule engine in C#, some of the rules are really complex and may change significantly in the future, so being able to escape out to C# is really useful. Of course this breaks cross-platform compatibility, but it is really just a way of hacking around edge cases without having to change your DSL.

Dale Halliwell
@Dale. Yes! There will be edge cases and there is no reason why they shouldn't be accreted in this way and then possibly abstracted and generalised later. I would expect that the major library functions would anyway be exposed so we would not have a single gateway.
peter.murray.rust
A: 

You'd be best off writing the library in C (or some language like rpython which will generate C-code) and then using SWIG or similar to generate the language specific bindings for C#, Java Python etc.

Note that this approach won't help if you are using Javascript in the browser - you'll have to write the javascript library separately. If you are using javascript through Rhino, then you'd be able to just use the Java bindings.

Ant
+6  A: 

There seems to be some ambiguity in the question between language and library. The terms "internal DSL" and "external DSL" are useful, and I think are due to Martin Fowler.

An "external" DSL might be a standalone command-line tool. It is passed a string of source, it parses it somehow, and does something with it. There are no real limits on how the syntax and semantics can work. It can also be made available as a library consisting mostly of an eval-like method; a common example would be building a SQL query as a string and calling an execute method in an RDBMS library; not a very pleasant or convenient usage pattern, and horrible if spread around a program on a large scale.

An "internal" DSL is a library that is written in such a way as to take advantage of the quirks of a host (general purpose) language to create the impression that a new language can be embedded inside an existing one. In syntactically-rich languages (C++, C#) this means using operator overloading in ways that seriously stretch (or ignore) the usual meanings of the operator symbols. There are many examples in C++; a few in C# also - the Irony parser toolkit simulates BNF in a fairly restrained way which works well.

Finally, there is a plain old library: classes, methods, properties, with well-chosen names.

An external DSL would allow you to completely ignore cross-language integration problems, as the only library-like portion would be an eval method. But inventing your own tool chain is non-trivial. People always forget the huge importance of debugging, intellisense, syntax highlighting etc.

An internal DSL is probably a pointless endeavour if you want to do it well on C# and Java. The problem is that if you take advantage of the quirks of one host language, you won't necessarily be able to repeat the trick on another language. e.g. Java has no operator overloading.

Which leaves a plain old library. If you want to span C# and Java (at least), then you are somewhat stuck in terms of a choice of implementation language. Do you really want to write the library twice? One possibility is to write the library in Java, and then use IKVM to cross-compile it to .NET assemblies. This would guarantee you an identical interface on both of those platforms.

On the downside, the API would be expressed in lowest-common-denominator features - which is to say, Java features :). No properties, just getX/setX methods. Steer clear of generics because the two systems are quite different in that respect. Also even the standard way of naming methods differs between the two (camelCase versus PascalCase), so one set of users would smell a rat.

Daniel Earwicker
@Earwicker +1 very useful overview. The involvement of Java and C# is a given (unfortunate). I'll certainly try IKVM. I don't mind losing properties and the generics are simple. The naming convention can be automated, I hope.I am aware of the problems of inventing my own toolchain and this is a useful summary.
peter.murray.rust
Yes, no doubt you could write a tool that would fix the naming convention. You might even be able to do something similar for properties - look for getX/setX method pairs (although perhaps an attribute marker would be necessary as not all getX methods are suitable for property syntax, e.g. sometimes they have visible side-effects).
Daniel Earwicker
@Earwicker. This is my current idea. Reduce the calls to a common subset of as many languages as possible (e.g. do not use properties, certainly not operator overloading).
peter.murray.rust
A: 

It is possible to interpret JavaScript from inside a Java-program directly using the script engine, and apparently also from C#. Python can be run on the JVM and the .NET engine.

I would suggest that you investigate these options, and then write your library in a common subset of the execution paths available to the language you choose. I would not consider writing it in a language which requires post translation and conversion, since you introduce a step which can be very, very difficult to debug in case of problems.

Thorbjørn Ravn Andersen
+1  A: 

Although I do not want to promote my own project too much, I would like to mention PIL, a Platform Independent Language, an intermediate language I have been working on to enable the support of multiple software platforms (like Java, Python, ...), specifically for external DSLs. The general idea is that you generate code in PIL (a subset of Java), which the PIL compiler can then translate to one of many other languages, currently just Java or Python, but more will be added in the future.

I presented a paper about this on the Software and Language Engineering conference about 2 days ago, you can find a link to the publication of the PIL website (pil-lang.org), if you're interested.

Zef Hemel
I've tried this sort of thing some years ago and it's useful to see people doing it rather better.
peter.murray.rust
+3  A: 

If you are willing to re-describe your language using ANTLR you could generate your DSL interpreter in multiple languages without having to manually maintain them including all of the languages you mentioned plus more.

Antlr is a parser/lexer generator and has a large number of target languages. This allows you to describe your language once, without having to maintain multiple copies of it.

See the whole list of target languages here.

Darien Ford
I like this idea personally. Of course it's a big leap and difficult to retreat from
peter.murray.rust
True, there is a learning curve with Antlr and StringTemplate. But personally, I love it. I am finding more and more issues can be solved with a simply described grammar. Ultimately the time you loose in the conversion process, you will more than make up for in maintenance savings.
Darien Ford
A: 

I would like to expand on Darien's answer. I think that ANTLR brings something to the table that few other lexer/parser tools provide (at least to my knowledge). If you would like to create a DSL which ultimately generates Java and C# code, ANTLR really shines.

ANTLR provides four fundamental components:

  • Lexer Grammar (break down input streams into tokens)
  • Parser Grammar (organize tokens into an abstract syntax tree)
  • Tree Grammar (walk the abstract syntax tree and pipe the metadata into a template engine)
  • StringTemplate (a template engine based on functional programming principles)

Your lexer,parser, and tree grammars can remain independent of your final generated language. In fact, the StringTemplate engine supports logical groups of template definitions. It even provides for interface inheritance of template groups. This means you can have third parties use your ANTLR parser to create say python, assembly, c, or ruby, when all you initially provided was java and C# output. The output language of your DSL can easily be extended as requirements change over time.

To get the most out of ANTLR you will want to read the following:

The Definitive ANTLR Reference: Building Domain-Specific Languages

Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages

Todd Stout