tags:

views:

343

answers:

3

I'm evaluating languages for a computational oriented app that needs an easy embedded scripting language for end users. I have been thinking of using Scala as the main underlying language and Jython for the scripting interface. An appeal of Scala is that I can define methods such as :* for elementwise multiplication of a matrix object and use it with infix syntax a :* b. But :* is not a valid method name in Python. How does Jython deal with this?

I would consider using Scala as the scripting language, due to its flexibility. But even with type inference, all the val and var and required type definitions are too much for lay users used to dynamic language like matlab. By comparison, Boo has the option -ducky option which might work, but I'd like to stay on the JVM rather than .NET. I assume there is no -ducky for Scala.

More generally, consider the following DSL (from http://www.cs.utah.edu/~hal/HBC/) to model a Latent Dirichlet Allocation:

model {
      alpha     ~ Gam(0.1,1)
      eta       ~ Gam(0.1,1)
      beta_{k}  ~ DirSym(eta, V)           , k \in [1,K]
      theta_{d} ~ DirSym(alpha, K)         , d \in [1,D]
      z_{d,n}   ~ Mult(theta_{d})          , d \in [1,D] , n \in [1,N_{d}]
      w_{d,n}   ~ Mult(beta_{z_{d,n}})     , d \in [1,D] , n \in [1,N_{d}]
}

result = model.simulate(1000)

This syntax is terrific (compared to PyMCMC for instance) for users familiar with hierarchical Bayesian modeling. Is there any language on the JVM that would make is easy to define such syntax, along with having access to a basic scripting language like python?

Thoughts appreciated.

A: 

EDIT:

After reading all the discussion, probably the best way to go is to define the grammar of your DSL and then parse it with the inbuilt parsing utilities of scala.

I'm not sure though what you are trying to achieve. Will your scripting language be more of a "what" or of a "how" type? The example you have given me is a "what" type DSL -> you describe what you are trying to achieve, and not care about the implementation. These are languages best used to describe a problem, and by the domain you are building the app for, I think it's the best way to go. The user just describes the problem in a syntax very familiar to the problem domain, the application parses this description and uses it as an input in order to run the simulation. For this, building a grammar and parsing it with the scala parsing utilities will probably be the best way to go (you only want to expose a small subset of features for the users).

If you need a "how" script, then using an already established scripting language is the way to go (unless you want to implement loops, basic data structures, etc yourself).

In designing a system, there will always be trade-offs to be made. Here it is between the amount of features you want to expose to the user and the terseness of your script. Myself, I'll go with exposing as few features as possible to get the job done, and get it done in a "how" way - the user doesn't need to know how you are going to simulate its problem if the simulation gives correct results and runs in reasonable time.

If you expose a full scripting language to the user, your DSL will just be a small API in that scripting language and the user will have to learn a full language to be able to use its full power. And you may not want a user to use its full power (it may wreck havoc to your app!). Why would you expose, for example, TCP socket support when your application doesn't need to connect to the internet? That could be a possible security hole.

-- The following section discusses possible scripting languages. My above answer advises against using them, but I have left the discussion for completeness.

I have no experience with it, but have a look at Groovy. It is a dynamically typed scripting language for the JVM (with JVM support probably going to get better in JDK 7 due to invokedynamic). It also has good support for operator overloading and writing DSLs. Unfortunately, it doesn't have support for user defined operators, at least not to my knowledge.

I would still go with scala though (partially because I like static typing and I find its type inference good :). It's scripting support is quite good, and you can make almost anything look like native language support (for example have a look at its actors library!). It also has very good support for functional programming, which can make scripts very short and concise. And as a benefit, you'll have all the power of the Java libraries at your disposal.

In order to use scala as a scripting language, just put your script in a file ending with .scala and then run scala filename.scala. See Scala as a scripting Language for a discussion, comparing scala with JRuby.

Flaviu Cipcigan
I am tempted by just using scala, but all the `val' `var' `new' will not appeal users used to matlab. It's a step down not a step up. Likewise, nonprogrammer users do not want to think about types (`Int` vs. `Double`) when defining simple functions.
Tristan
+2  A: 

Personally, I think you overstate the overhead of Scala. For instance, this:

alpha     ~ Gam(10,10)
mu_{k}    ~ NorMV(vec(0.0,1,dim), 1, dim)     , k \in [1,K]
si2       ~ IG(10,10)
pi        ~ DirSym(alpha, K)
z_{n}     ~ Mult(pi)                          , n \in [1,N]
x_{n}     ~ NorMV(mu_{z_{n}}, si2, dim)       , n \in [1,N]

could be written as

def alpha =                   Gam(10, 10)
def mu    = 1 to 'K map (k => NorMV(Vec(0.0, 1, dim), 1, dim)
def si2   =                   IG(10, 10)
def pi    =                   DirSym(alpha, 'K)
def z     = 1 to 'N map (n => Mult(pi))
def x     = 1 to 'N map (n => NormMV(mu(z(n)), si2, dim))

In this particular case, almost nothing was done, except define Gam, Vec, NorMV, etc, and create an implicit definition from Symbol to Int or Double, reading from a table where you'll store such definitions later on (such as with a loadM equivalent). Such implicit definitions would go like this:

import scala.reflect.Manifest
val unknowns = scala.collection.mutable.HashMap[Symbol,(Manifest[_], Any)]()
implicit def getInt(s: Symbol)(implicit m: Manifest[Int]): Int = unknowns.get(s) match {
  case Some((`m`, x)) => x.asInstanceOf[Int]
  case _ => error("Undefined unknown "+s)
}
// similarly to getInt for any other desired type

It could be written as such, too:

Model (
  'alpha    -> Gam(10, 10),
  'mu -> 'n -> NorMV(Vec(0.0, 1, dim), 1, dim)      With ('k in (1 to 'K)),
  'si2      -> IG(10, 10),
  'pi       -> DirSym('alpha, 'K),
  'z -> 'n  -> Mult('pi)                            With ('n in (1 to 'N)),
  'x -> 'n  -> NorMV('mu of ('z of 'n), 'si2, dim)) With ('n in (1 to 'N)) 
)

In which case Gam, Mult, etc would need to be defined a bit different, to handle the symbols being passed to them. The excess of "'" is definitely annoying, though.

It's not like HBC doesn't have it's own idiosyncrasies, such as the occasional need for type declarations, underscores before indices, the occasional need to replace "~" with "\in", or even the backslash that needs to preceed the later. As long as there is a real benefit from using it instead of HBC, MathLab, or whatever else the person is used to, they'll trouble themselves a bit.

Daniel
Um. The author of the question _likes_ the syntax he can get in Scala for this case (the one you are trying to replace with something else). The `def`s in your proposed replacement fall into the same category as "all the `val` and `var`" which he would prefer to avoid.
Alexey Romanov
You misread him. The syntax he shows, if you follow the link, is provided by something called HBC. He would *like* to have that, if possible. Now, I'm addressing precisely this objection of his you quote, making the claim it's not so bad as he thinks.
Daniel
That's right, sorry. Still, a lot of `def` probably isn't much better than `'` for him.
Alexey Romanov
This is impressively close, although I think Scala just isn't an option. Users want dynamically typed language for basic stuff. Beyond basic dynamic language (ala matlab) I would like to be able to define internal DSLs like HBC. It seems there is no way but to parse things myself.
Tristan
A: 

None of the obvious suspects among JVM scripting languages -- JavaScript Rhino, JRuby, Jython, and Groovy -- have support for user-defined operators (which you'll probably need). Neither does Fan.

You might try using JRuby with superators gem.

Alexey Romanov
When did JavaScript become a JVM language?
Daniel
In 1997, when the Rhino ECMAScript implmentation was developed. Since Java 6, Rhino is actually a standard part of the JRE (or at least the JDK).
Jörg W Mittag
JVM languages are languages that get compiled to JVM bytecode. Rhino doesn't do that, does it?
skaffman
Yes. It does.
Jeff