views:

123

answers:

4

I came across an interesting bug with linq to sql. Take a look at the code below which is loosely translated from a LINQtoSQL query from a search engine i'm writing.

The goal of the query is to find any groups which have the ID's "Joe", "Jeff", "Jim" in consecutive order.

Pay careful attention to the variables named localKeyword and localInt. If you were to delete the declarations of these seemingly useless local variables and replace them with the ones they are proxying, you would find the query no longer works.

I'm still a beginner with linq to sql but it looks like it is passing all the locals as references. This results in the query only having the value of local variables when the query is evaluated. In LINQ to SQL my query ended up looking like

SELECT * FROM INDEX ONE, INDEX TWO, INDEX THREE 
  WHERE ONE.ID = 'Jim' and TWO.ID = 'Jim' and 
    TWO.SEQUENCE = ONE.SEQUENCE + 2 and 
    THREE.ID = 'Jim' and 
    THREE.SEQUENCE = ONE.SEQUENCE + 2 and 
    ONE.GROUP == TWO.GROUP and ONE.GROUP == THREE.GROUP

The query is of course paraphrased. What exactly is happening, is this a bug? I am asking to perhaps better understand why this is happening. You should find the code compiles in visual studio 2008.

using System;
using System.Collections.Generic;
using System.Text;
using System.Linq;

namespace BreakLINQ
{
    class Program
    {
        public struct DataForTest
        {
            private int _sequence;
            private string _ID;
            private string _group;

            public int Sequence
            {
                get
                {
                    return _sequence;
                }
                set
                {
                    _sequence = value;
                }
            }
            public string ID
            {
                get
                {
                    return _ID;
                }
                set
                {
                    _ID = value;
                }
            }
            public string Group
            {
                get
                {
                    return _group;
                }
                set
                {
                    _group = value;
                }
            }
        }
        static void Main(string[] args)
        {
            List<DataForTest> elements = new List<DataForTest>
            {
                new DataForTest() { Sequence = 0, ID = "John", Group="Bored" },
                new DataForTest() { Sequence = 1, ID = "Joe", Group="Bored" },
                new DataForTest() { Sequence = 2, ID = "Jeff", Group="Bored" },
                new DataForTest() { Sequence = 3, ID = "Jim", Group="Bored" },
                new DataForTest() { Sequence = 1, ID = "Jim", Group="Happy" },
                new DataForTest() { Sequence = 2, ID = "Jack", Group="Happy" },
                new DataForTest() { Sequence = 3, ID = "Joe", Group="Happy" },
                new DataForTest() { Sequence = 1, ID = "John", Group="Sad" },
                new DataForTest() { Sequence = 2, ID = "Jeff", Group="Sad" },
                new DataForTest() { Sequence = 3, ID = "Jack", Group="Sad" }
            };

            string[] order = new string[] { "Joe", "Jeff", "Jim" };
            int sequenceID = 0;
            var query = from item in elements
                        select item;
            foreach (string keyword in order)
            {
                if (sequenceID == 0)
                {
                    string localKeyword = keyword;
                    query = from item in query
                            where item.ID == localKeyword
                            select item;
                }
                else
                {
                    string localKeyword = keyword;
                    int localSequence = sequenceID;
                    query = from item in query
                            where (from secondItem in elements
                                   where secondItem.Sequence == item.Sequence + localSequence &&
                                         secondItem.ID == localKeyword
                                   select secondItem.Group).Contains(item.Group)
                            select item;
                }
                sequenceID++;
            }
        }
    }
}

The value of the query after the code completes should have the value {"Joe", "Bored", 1}.

+2  A: 
var correctQuery = 
   from o in elements
   join tw in elements on o.Sequence equals tw.Sequence - 1
   join th in elements on tw.Sequence equals th.Sequence - 1
   where
       o.ID == "Joe" && tw.ID == "Jeff" && th.ID == "Jim" && o.Group == tw.Group &&
       th.Group == tw.Group
   select new {o.ID, o.Sequence, o.Group};
Yuriy Faktorovich
For a search engine you need a way to do this with an arbitrary array of elements, say new string[] { "John", "joe", "Jeff", "Jim" };
SmokingRope
However, i was surprised how much cleaner this turns out to be :)
SmokingRope
+3  A: 

The reason this fails without the 'proxying' variables is that the variables are captured by the expressions in the LINQ query. Without the proxies, each iteration of the loop references the same two variables (keyword, sequenceID), and when the query is finally evaluated and executed, the value substituted for each of these references is identical; namely, whatever value is present in those variables when the loop terminates (which is when you want us to evaluate 'query').

The query behaves as expected with the proxies because the captured variables are uniquely declared per iteration of the loop; subsequent iterations do not modify the captured variables, because they are no longer in scope. The proxy variables are not useless at all. Furthermore, this behavior is by design; let me see if I can find a good reference link...

Ben M
+2  A: 

This is not a bug, it is "by design."

What's happening under the hood here is that you are capturing the iteration variable of a for loop in a lambda expression. It's actually being used in a query but under the hood this will be translated into a lambda expression.

In a for loop, there is only one iteration variable for all of the loops. Not one for each iteration of the loop. So each query is capturing the same variable. When executed the query will run against the current, or in this case last, value stored in the iteration variable.

The reason your temporary variable trick works is there will essentially be one instance of the temporary variable for each iteration of the loop. So each query is capturing a different, independent value.

A more concise example demo'ing this problem is as follows

var list = new List<Func<int>>();
foreach (var cur in Enumerable.Range(1,3)) {
  list.Add(() => cur);
}
foreach ( var lambda in list ) {
  Console.WriteLine(lambda());  // always prints 3
}
JaredPar
+3  A: 

See also

On lambdas, capture, and mutability

Brian