views:

58

answers:

5

I'm reading manning book about LINQ, and there is an example:

    static class QueryReuse
    {
       static double Square(double n)
       {
         Console.WriteLine("Computing Square("+n+")...");
         return Math.Pow(n, 2);
       }
       public static void Main()
       {
         int[] numbers = {1, 2, 3};
         var query =
                  from n in numbers
                  select Square(n);

         foreach (var n in query)
              Console.WriteLine(n);

         for (int i = 0; i < numbers.Length; i++)
              numbers[i] = numbers[i]+10;

         Console.WriteLine("- Collection updated -");

         foreach (var n in query)
             Console.WriteLine(n);
    }
}

with the following output:

Computing Square(1)...
1
Computing Square(2)...
4
Computing Square(3)...
9
- Collection updated -
Computing Square(11)...
121
Computing Square(12)...
144
Computing Square(13)...
169

Does this means, that 'numbers' is passed by reference? Does this behavior have to do something with lazy execution and yield? Or I'm on a wrong track here?

A: 

Yes, the numbers variable is passed by reference, not because you use LINQ, but because arrays are reference types.

The fact that the output changes is due to the deferred/lazy evaluation of LINQ.

Albin Sunnanbo
There's a difference between "pass by reference" and "pass reference by value". See http://pobox.com/~skeet/csharp/parameters.html
Jon Skeet
+4  A: 

The reference to numbers is passed by value. However, the query is evaluated lazily, and the underlying array is mutable.

So what does that mean?

var arr = new[]{1,2,3,};
var q = arr.Select(i=>i*2);
Console.WriteLine(string.Join(", ",q.ToArray())); //prints 2, 4, 6
arr[0]=-1;
Console.WriteLine(string.Join(", ",q.ToArray())); //prints -2, 4, 6
// q refers to the original array, but that array has changed.
arr = new[]{2,3,4};
Console.WriteLine(string.Join(", ",q.ToArray())); //prints -2, 4, 6
//since q still refers to the original array, not the variable arr!

Generally, it can get confusing pretty quickly if you change variables rather than their underlying objects, so it's better to avoid changes like this.

For example:

var arr = new[]{1,2,};
var arr2 = new[]{1,2,};
var q = from a in arr
        from b in arr2
        select a*b;

// q is 1,2,2,4
arr = new[]{0,1}; //irrelevant, arr's reference was passed by value
// q is still 1,2,2,4

arr2 = new[]{0,1}; //unfortunately, relevant
// q is now 0, 1, 0, 2

To understand this, you need to understand the details of the compilation process. Query expressions are defined as equivalent to an extension method syntax (arr.Select...) which uses closures heavily. As a result, effectively only the first enumerable or queryable has its reference passed by value, the rest are captured in closures, and that means their references are effectively passed by reference. Confused yet? Avoid changing variables like this to keep your code maintainable and readable.

Eamon Nerbonne
+3  A: 

The query is stored as exactly that - not a result set, just a query.

When you request the results from the query it evaluates the query using the current values at the time the query is executed, not the values from when the query was created. If you evaluate the same query twice you can get different results if the underlying data has changed, as the example you have provided in the question demonstrates.

Mark Byers
+4  A: 

It's to do with lazy execution. Every time you iterate through the query, that will be looking at numbers again. Indeed, if you change the value of a late element of numbers while you're executing the query, you'll see that change too. This is all changing the contents of the array.

Note that the query remembers the value of numbers at the time of the query creation - but that value is a reference, not the contents of the array. So if you change the value of numbers itself like this:

numbers = new int[] { 10, 9, 8, 7 };

then that change won't be reflected in the query.

Just to complicate things, if you use variables within other parts of the query, like this:

int x = 3;

var query = from n in numbers
            where n == x
            select Square(n);

then the variable x is captured rather than its value... so changing x will change the results of evaluating the query. That's because the query expression is really translated to:

var query = numbers.Where(n => n == x)
                   .Select(n => Square(n));

Note that here, x is used within a lambda expression, but numbers isn't - that's why they behave slightly differently.

Jon Skeet
A: 

This is because the reference to numbers is a closure and combined with lazy execution of the enumeration it gives this result.

veggerby