views:

72

answers:

3

C# has generator functions which have syntax like:

IEnumerable<int> GetNats(int max)
{
   for (int i=0; i < max; ++i)
      yield return i;
}

A feature I am interested in for my programming language (a simple object-oriented programming similar to Java, Scala, ActionScript, and C#) are generator expressions. These are essentially syntactic sugar for generator functions.

My current favorite candidate is the following syntax:

IEnumerable<int> nats = 
  witheach (i in range(0, 42)) 
     yield i * 2;

The expression range(0, 42) is a built-in generator function.

So my question is what syntax would you prefer to see for generator expressions, in a C#/Java/Scala/ActionScript type language, and why?

Some factors that may influence responses are that like Scala and ActionScript, types in my language are declared after the type. For example:

var myVar : SomeType = initialValue;

Also anonymous functions look like this:

var myFunc = function(int x) { 
  return x + 1; 
}

Other than that the rest of my language syntax resembles Java or C#. My language has a foreach statement which is very similar to C#.

+1  A: 
IEnumerable nats = 0...42

or for generators 

IEnumerable nats = yield 0...42
Trickster
Good point. I changed the example to be less stupid. :-)
cdiggins
Oh there is a second point here: not just that the example was too trivial but that "a..b" is better syntax for ranges. Nice.
cdiggins
you constructing enumerable from range literal. i think it's all right. ienumerable nats = 0...42int[] nats = 0...42 how literal will be represented depends on type of variable
Trickster
+2  A: 

There's also Python's approach--no special syntax for generators; the presence of a "yield" statement is all it takes. Any statements that takes a block could be used, though I'd expect to find only loops used in practice:

IEnumerable<int> nats = 
    for (i in range(0,42))
        yield i*2

String phrase = "I want to buy some cheese.";
IEnumerable<int> substrs =
    for (i in 0 .. phrase.length)
        for (j in i .. phrase.length+1)
            yield phrase[i..j]

Here I'm assuming the right endpoint isn't included in a range.

To be completely honest, when I saw this in Python, I had to wonder: "at what cost?"

outis
I like one a lot! What costs do you expect? I think that any potential performance issues can be eliminated by a sufficiently clever compiler.
cdiggins
No cost to the program itself. I wondered what additional complexity it added to the compilation step, since Python couldn't tell a function was a generator without examining its parse tree, though it probably wouldn't be too bad, depending on your parser. You could mark the node for a yield statement as a "generator" node, then propagate that mark up to the root of the statement (not that propagation would be easy in all parsers). It was an entirely vague feeling with nothing to back it up.
outis
+2  A: 

Check out what F# does. You can do

seq { seq-expr }    // IEnumerable<T>, a.k.a. seq<T>
[ seq-expr ]        // list<T>
[| seq-expr |]      // array<T>

where seq-expr is a form that includes most language constructs along with 'yield'. So e.g. you can write

seq {
    for i in 0..9 do
        for j in someColl do
            if i <> j then
                yield i*j
}

The F# compiler translates this code into a state machine implementation of IEnumerable (like C# does for iterator blocks).

I like this syntax because it means e.g. you can write the exact same code you would write to "do imperative stuff now", e.g.

    for i in 0..9 do
        for j in someColl do
            if i <> j then
                printfn "%d" i*j

but wrap that code in seq{} and use 'yield' and the code becomes a lazy IEnumerable instead.

Brian
I really appreciate an F# perspective brought in to the discussion. I however don't understand what semantics that "seq { ... }" adds, given that there is a yield statement already. Couldn't the language just infer that the "for ... block" is a generator expression because of the existence of a yield?
cdiggins
Imagine the the line before the first 'for' says 'printfn "Starting enumeration"'. If it is contained inside the seq{}, then every time someone iterates over this enumerable, that side effect will happen before producing the first element. If the print were outside the seq{}, then it would be printed before creating the enumerable. So in this case the seq curlies delimit the expression-that-gets-lazified-into-the-enumerable. This will only be an observable difference when we speak of expressions with side-effects (like printf).
Brian
Thanks for the explanation Brain. If I was designing F#, I would have let any statement be an expression as long as it contained a yield. The seq { } could then be used optionally. Unless, this is how it works and I misunderstood something.
cdiggins