views:

55

answers:

3

Are there any languages that support easily extending the compiler to support new literals example such as for json/xml/yaml/different string encodings/extra number types. I understand that you can always fork the compiler, write a dsl but that's not quite what I'm asking about. I understand that some languages such as smalltalk and lisp are incredibly flexible but I'm not looking for something that can be done fairly simple by one team, but something which is an attempt to bring this to the whole language and allow it as common practice. You can also share any research on similar ideas.

Alternatively are there any languages that support literals via special methods in the object with string arguments alla (in this case the """ denote the start and end of the string to be handed to Xml.newFromTripleString(String a).

Xml exapmleXml=""" #{name} #{title} """ I understand that many languages support this type of thing by doing something like XMl exapmleXml= Xml.newFromTripleString(" " + "\n" " + "\n" + " #{name} + "\n" + " #{title} + "\n" + "") but do any language try to make this easier with something like implicit conversion? Any research on these sort of techniques?

Any links and explanation to other ideas on how to introduce more flexible literal or literal-like support in languages would also be nice.

+2  A: 

It sounds like user-defined literals (§2.14.8) in C++ 0x (warning: large PDF) are pretty close to (or maybe exactly) what you're looking for.

Jerry Coffin
+1  A: 

The D programming language has the constructs needed to, at compile time, convert a string literal containing JSON into the corresponding object/struct/arrays. It can even load the string at compile time from a external file. I don't know of any code to do it but it wouldn't be particularly hard to write.

If you want the same thing at runtime, D has Associative arrays, and dynamic arrays as well as the standard set of OO features so building a genetic JSON DOM model shouldn't be hard.

I don't know any of the other encodings but I'd be surprised if they were any more of a problem than JSON.

Short answer: D doesn't support it natively but it wouldn't be at all hard to make it work.

BCS
+3  A: 

Ioke

Ioke allows overriding of literals, but not the definition of new ones. The way this works is that literals simply get translated into message sends and then you can override the corresponding methods just like any other.

For example, this the "literal syntax" for a Dict with two entries, one mapping a Symbol to a Text, the other mapping a Symbol to a Number:

{ :name => "Jörg", :age => 31 }

This actually gets translated into a message send, for a message named {} (BTW: lists work the same way, their corresponding message is []). It is exactly equivalent (and can be written like this if you want) to:

{}(:name => "Jörg", :age => 31)

Now, => is actually just an operator which is defined for almost all objects and which simply returns a Pair with the key (the first element) being the receiver and the value being the argument. Now, operators are also just message sends, so this is equivalent to:

{}(:name =>("Jörg"), :age =>(31))

The : sigil which denotes a literal symbol gets translated into a message send, too:

{}(:("name") =>("Jörg"), :("age") =>(31))

The text literal gets translated into sending the internal:createText message:

{}(:("name") =>(internal:createText("Jörg")), :("age") =>(31))

[Note: obviously, the way it is written here will lead to an infinite recursion. The truth is that the argument to internal:createText is obviously not an Ioke Text but rather a platform string. I.e. for ikj, the JVM implementation of Ioke, it is actually a java.lang.String and for ikc, the CIL implementation, it is a System.String. I've expressed this here using triple quotes.]

{}(:("name") =>(internal:createText("""Jörg""")), :("age") =>(31))

This just leaves us with the number, which, you guessed it, is also a message send:

{}(:("name") =>(internal:createText("""Jörg""")),
   :("age") =>(internal:createNumber("""31""")))

Since everything is a message send, this allows you to customize the behavior of literals at will, just by implementing the corresponding methods. Here's a short transcript from iik, the interactive Ioke REPL:

iik> "Hello"
+> "Hello"

iik> internal:createText = method(raw, super(raw) upper)

iik> "Hello"
+> "HELLO"

Converge

Converge allows for powerful compile time metaprogramming, including a feature called DSL Blocks. A DSL block is a block of code which does not use the Converge syntax. A DSL block looks like this:

$<<xml>>:
    <xml>
      <literal>here</literal>
    </xml>

The way this works is that the string in between the $<< and >> is the name of a function which gets called at compile time and gets passed the entire DSL block as a string (as well as some source code metadata such as line number, file name etc.) and returns a fragment of the Converge Abstract Syntax Tree. So, in this particular case, there would be a function like this:

func xml(dsl_block, src_infos):
    // implement an XML parser here ...
    return ast

Factor

Factor allows the definition of Parsing Words which are words that affect the way other words in the same scope are parsed. Factor actually has an XML library implementation that uses parsing words to get syntax that looks very much like Scala's XML literals but is just normal Factor code:

: feed>xml ( feed -- xml )
    [ title>> ]
    [ url>> present ]
    [ entries>> [ entry>xml ] map ] tri
    <XML
        <feed xmlns="http://www.w3.org/2005/Atom"&gt;
            <title><-></title>
            <link href=<-> />
            <->
        </feed> 
    XML> ;

[Quick intro to Factor: : defines a new word, i.e. the first line defines a word named feed>xml which takes one argument and produces one result. The first three lines of the word extract the title, the URI and the entries out of the feed object and place them on the stack. The <XML is the parsing word which turns on XML mode and XML> turns it off again. Inside the XML code, <-> takes a value from the stack and inserts it into the XML.]

Common Lisp

Common Lisp Reader Macros allow you to hook into the reading stage, i.e. the stage that takes a string and produces nested lists and then hands them off to the compiler/evaluator. They require you to choose a unique one- or two-character prefix and they are global. The first one is not so much of a problem, since we can simply choose the < character as our prefix to make it look natural.

Perl 6

Perl 6 should allow you to change the syntax while the program runs. Perl 6 has a dynamic mutable grammar which means that code gets parsed while it is executed and can change the grammar so that other code further down in the file gets parsed using the new grammar.

OMeta/COLA

Alessandro Warth's OMeta language running on top of Ian Piumarta's COLA system allows for what they call "mood-specific languages". I.e. languages whose specification and implementation is so lightweight that you can use them for just one line in the middle of your program and then switch to a different syntax again.

It is used in the Inventing Fundamental New Computing Technologies at Alan Kay's Viewpoint Research Institute. One example usage is the implementation of an entire TCP/IP networking stack in just 200 lines of code by designing a language whose syntax is identical to the ASCII art diagrams used in IETF RfCs and another language for writing networking protocol state machines. Then, the implementation of the networking stack simply consists of copy&pasting the ASCII diagrams from the RfCs and transiterating the English state machine descriptions from the RfCs into the state machine language.

(Oh, in case you are wondering: the 200 lines is not just for the ASCII diagrams and the state machines. It also includes the parsers and compilers for the two languages.)

π

The π programming language is probably interesting as well.

Jörg W Mittag
+1 for lots of non-mainstream interesting looking languages
Roman A. Taycher