views:

131

answers:

3

Take the code below, adapted from this question:

//Borrowed from another question because its a simpler example of what happened to me.
IEnumerable<char> query = "Not what you might expect";
foreach(char vowel in "aeiou")
{
     query = query.Where(c => c != vowel);
}
foreach (char Output in query)
{
    System.Out.WriteLine(Output);
}

This only removes the 'u' from the query char collection. The core issue has something to do with the fact that the 'c' variable in the Where clause isn't evaluated until the second foreach. My question is:

1) Why would the delegate generated by the first foreach not capture each value of c as it is built up? Is there some situation I'm unaware of where that is not the desired behavior?

2) If its not capturing the value of c, how is that value even still in scope in the second foreach when the query is actually run? It would seem to me that if its not storing the values of the variables being passed in, then trying to resolve the statement for the second foreach would fail because the the variable c is clearly out of scope.

I don't understand how it is that 'use the last value we saw on this variable back when it was in scope' was a good design decision for this circumstance, and was hoping someone could shed some light on the subject.

+5  A: 

It is capturing vowel; try:

foreach(char vowel in "aeiou") {
     char tmp = vowel;
     query = query.Where(c => c != tmp);
}

Jon has a list of related posts here.

As for why... foreach is defined (in ECMA 334v4 §15.8.4) as:

A foreach statement of the form `foreach (V v in x) embedded-statement` is then expanded to:

{
  E e = ((C)(x)).GetEnumerator();
  try {
    V v;
    while (e.MoveNext()) {
      v = (V)(T)e.Current;
      embedded-statement
    }
  }
  finally {
    … // Dispose e
  }
}

Note the V is outside the while - this means it is captured once only for the foreach. I gather that the C# team have, at times, doubted the wisdom of this change (it was inside the while in the C# 1.2 spec, which would have avoided the problem completely). Maybe it'll change back eventually, but for now it is being true to the specification.

Marc Gravell
I know it works with the temp variable; what I meant to ask is why doesn't it work the way that seems to be the intended way 99% of the time?
GWLlosa
I know this may seem really dumb but what does the little § mean? I have seen a few times and never been game to ask
Nathan W
§="section"; http://en.wikipedia.org/wiki/Section_sign
Marc Gravell
thanks. thought as much, but didn't know how to google search it.
Nathan W
Is there any documentation or guesstimates as to why they changed it after the C# 1.2 spec?
GWLlosa
Wild guess - if V is a large value type, each declaration would be a large allocation leading to a performance problem.
David B
@David B: How would that be any different? In fact, **because** of the glitch we now need two copies of it... in reality, value-types shouldn't be that large anyway.
Marc Gravell
+4  A: 

As Marc says, it's capturing vowel as a variable, rather than the value of vowel in each case. This is almost always undesirable, but it is the specified behaviour (i.e. the compiler is following the language spec). There's a single vowel variable, rather than a "new variable instance" per iteration.

Note that there's no expression tree involved here - you're using IEnumerable<T> so it's just converting the lambda expression into a delegate.

See section 7.14.4 of the C# 3 spec for more information.

Jon Skeet
What I mean to ask is if its almost always undesirable, why does it work that way? Wouldn't everyone prefer if it worked the way we obviously meant?
GWLlosa
If it is almost always undesirable, why is it the language spec?
Svish
Yes, I suspect they would... and I suspect the language designers would regard this decision as a mistake now. Whether they'd be willing to break any applications (if there *are* any) which rely on this behaviour is a different matter though.
Jon Skeet
The hard part is coming up with a realistic example where the change would *break* code that isn't already horribly, horribly broken. I seem to recall coming up with *one* example, but this example was truly, truly awful, and frankly deserved to get broken.
Marc Gravell
Absolutely - I would support such a change, but whether the team is happy to allow even that breakage of weird, almost certainly broken code is a different matter.
Jon Skeet
Yes, we now realize that this scenario does produce undesirable behaviour and wish that we'd done it the other way. However, changing it now would be a breaking change, and yes, we try to not break even weird code. You'd be surprised at the amount of weird code that's in the wild.
Eric Lippert
I don't suppose anyone has an example of what sort of code benefits from it? I haven't been able to figure one out.
GWLlosa
It's not so much that there's a benefit as that this is simply the unfortunate interaction of two individually sensible rules: (1) closures close over variables, and (2) the loop variable in a foreach is logically a single variable for every iteration, not a fresh new variable for each run through the loop. Though (1) has clear benefits (allowing different anonymous functions to share state) it is often surprising to people. But (2) has no clearly observable benefits that I know of; it just seems conceptually cleaner to declare only one variable, as a "for" loop typically does.
Eric Lippert
In fact, I'm not even sure I'd say it's conceptually cleaner to declare only one variable. The iteration variable is read-only within the body of the foreach, and yet it magically changes between iterations. To me, it feels simpler to imagine it as a read-only variable declared *inside* the body of the loop (just before the user's code). That also works better with how it reads: "for each variable of type Foo in ..." The fact that the variable declaration part comes after the "each" suggests (to me) that there's a new variable per iteration. I know there *isn't*, but it's always felt wrong.
Jon Skeet
I see your point -- you're making a semantic argument. But the counter-argument is the syntactic argument that in "foreach(C c in cs){ ... }, the "C c" is lexically _outside_ the { }, and therefore should behave as though it was a declaration of a single variable outside the { }. "foreach" is a bit of a platypus -- a bit of a mishmash and therefore hard to reason about intuitively.
Eric Lippert
Good point (surprise surprise) - I hadn't really thought of the braces. I suspect would be hard to come up with a clean syntax which was syntactically accurate, semantically useful (i.e. had the behaviour we want most of the time) and intuitive. But hey, that's why I don't get paid to design languages ;)
Jon Skeet
+1  A: 

I think what you really want to do here is something more like this:

IEnumerable<char> query = "Not what you might expect";
query = query.Except("aeiou");

foreach (char Output in query)
{
    System.Out.WriteLine(Output);
}
Joel Coehoorn
In this case, that's true, in that it would work. But this was a simplified example with lots of extraneous business code removed. The real version had a lot more stuff, including several nested foreaches and unions of unions of sequences. The root problem was the same as the simplified example, so that's what I posted.
GWLlosa