views:

509

answers:

2

I have a LINQ query which is attempting to get all of the distinct months of all of the dates in a table.

I had this working using the Distinct() extension method. I then made it more readable by using an extension method to extract the month. And then it stopped returning Distinct results.

Can anyone help me work out what happened here?

As an aside, if someone can tell me the best way to get the distinct months, that would be nice too. But it's more important that I understand why this is failing.

Here's the code.

static class DcUtils
{
     public static DateTime GetMonth(this Timesheet_Entry entry)
    {
        DateTime dt = new DateTime(
                                     entry.Entry_Start_DateTime.Year,
                                     entry.Entry_Start_DateTime.Month, 
                                     1
                                  );
        return dt;
    }
}



public class Demo
{
    public DemonstrateBug()
    {
        TimesheetDataClassesDataContext dc = new TimesheetDataClassesDataContext();
    /////////////////////////////////
    //// Here are the queries and their behaviours
    var q1 = (
                from ts
                in dc.Timesheet_Entries
                select new DateTime(ts.Entry_Start_DateTime.Year, ts.Entry_Start_DateTime.Month, 1)
             ).Distinct();
    // This returns 3 (which is what I want)
    int lengthQuery1 = q1.Count();  

    // And now for the bug!

    var q2 = (
                 from ts
                 in dc.Timesheet_Entries
                 select ts.GetMonth()
             ).Distinct(); 
        // This returns 236 (WTF?)
        int lengthQuery2 = q2.Count();
    }
}
+1  A: 

It occurred to me to run this through the SQL Server Profiler.

This query:

var q1 = (                
            from ts                
            in dc.Timesheet_Entries                
            select new DateTime(ts.Entry_Start_DateTime.Year,
                                ts.Entry_Start_DateTime.Month, 
                                1)             
          ).Distinct();

generates this SQL:

exec sp_executesql N'SELECT DISTINCT [t1].[value]
FROM (
    SELECT CONVERT(DATETIME, CONVERT(NCHAR(2), DATEPART(Month, [t0].[Entry_Start_DateTime])) + (''/'' + (CONVERT(NCHAR(2), @p0) + (''/'' + CONVERT(NCHAR(4), DATEPART(Year, 
[t0].[Entry_Start_DateTime]))))), 101) AS [value]
    FROM [dbo].[Timesheet_Entry] AS [t0]
    ) AS [t1]',N'@p0 int',@p0=1

But if I put the month extraction logic in the extension method:

    var q2 = (                 
                 from ts                 
                 in dc.Timesheet_Entries                 
                 select ts.GetMonth()             
             ).Distinct();

It generats this SQL:

SELECT DISTINCT [t0].[Timesheet_Entry_ID], [t0].[Entry_Start_DateTime], [t0].[Entry_End_DateTime], [t0].[Task_Description], [t0].[Customer_ID]
FROM [dbo].[Timesheet_Entry] AS [t0]

So it's moved the DISTINCT function to the server, but kept the date extraction code until after the DISTINCT operation, which is not what I want, and is not what happens in the first example.

I don't know if I should call this a bug or a leaky abstraction.

Andrew Shepherd
I'd call it a bug. Try forcing the issue by calling the Enumerable.Distinct extension method instead of Queryable.Distinct
David B
@David B: I tried this: ( (IEnumerable<DateTime>)(dc.Timesheet_Entries.Select(tse => tse.GetMonth()))).Distinct(); It works as it should - returning 3 distinct values.
Andrew Shepherd
+1  A: 

LINQ to SQL is smart enough to convert the new DateTime() expression from your initial lambda expression into a SQL statements that can be executed at the server. If you replace this expression with an (extension) method, LINQ to SQL will only see a call to an opaque method it knows nothing about, hence it cannot generate any SQL for the method call and the part of the SQL query messing with the dates disappears.

But this shouldn't break anything - what cannot be transformed into SQL must be executed at the client. So what happens? The date you want to perform the distinct operation on cannot be calculated at the server because of the opaque method call, hence the distinct operation cannot be performed at the server, too. But the query you recorded from the broken version contains a DISTINCT statement.

I don't use the LINQ query syntax, but I assume you have written something you don't actually mean or the compiler or LINQ to SQL inferred something you didn't mean.

context
    .Timesheet_Entries
    .Select(tse => tse.GetMonth())
    .Distinct()

versus

context
    .Timesheet_Entries
    .Distinct()
    .Select(tse => tse.GetMonth())

So I guess you got the second one for what ever reason - the distinct operation seems to get propagated over the select. Maybe it's the combination of Distinct() with the query syntax and the contained opaque method call. Try both versions without the query syntax and see what you get back and what queries are send to the server. You can also try to insert ToList() calls to force the transition from LINQ to SQL to LINQ to Objects - this might help to cast some light onto the situation, too.

Daniel Brückner
+1 for suggesting use of Linq to Objects to diagnose whether the problem comes from the Linq to SQL translation.
John Fisher
context.Timesheet_Entries.Select(...).Distinct() returns the same results as context.Timesheet_Entries.Distinct().Select(...). (Even though it obviously shouldn't). Inserting ToList() in the middle does indeed make it work as expected.
Andrew Shepherd