views:

23364

answers:

3

I have to perform the following SQL query:

select answer_nbr, count(distinct user_nbr)
from tpoll_answer
where poll_nbr = 16
group by answer_nbr

The LINQ to SQL query

from a in tpoll_answer 
where a.poll_nbr = 16 select a.answer_nbr, a.user_nbr distinct 

maps to the following SQL query:

select distinct answer_nbr, distinct user_nbr
from tpoll_answer
where poll_nbr = 16

So far, so good. However the problem raises when trying to GROUP the results, as I'm not being able to find a LINQ to SQL query that maps to the first query I wrote here (thank you LINQPad for making this process a lot easier). The following is the only one that I've found that gives me the desired result:

from answer in tpoll_answer where answer.poll_nbr = 16 _
group by a_id = answer.answer_nbr into votes = count(answer.user_nbr)

Which in turns produces the follwing ugly and non-optimized at all SQL query:

SELECT [t1].[answer_nbr] AS [a_id], (
    SELECT COUNT(*)
    FROM (
        SELECT CONVERT(Bit,[t2].[user_nbr]) AS [value], [t2].[answer_nbr], [t2].[poll_nbr]
        FROM [TPOLL_ANSWER] AS [t2]
        ) AS [t3]
    WHERE ([t3].[value] = 1) AND ([t1].[answer_nbr] = [t3].[answer_nbr]) AND ([t3].[poll_nbr] = @p0)
    ) AS [votes]
FROM (
    SELECT [t0].[answer_nbr]
    FROM [TPOLL_ANSWER] AS [t0]
    WHERE [t0].[poll_nbr] = @p0
    GROUP BY [t0].[answer_nbr]
    ) AS [t1]
-- @p0: Input Int (Size = 0; Prec = 0; Scale = 0) [16]
-- Context: SqlProvider(Sql2008) Model: AttributedMetaModel Build: 3.5.30729.1

Any help will be more than appreciated.

+1  A: 

Linq to sql has no support for Count(Distinct ...). You therefore have to map a .NET method in code onto a Sql server function (thus Count(distinct.. )) and use that.

btw, it doesn't help if you post pseudo code copied from a toolkit in a format that's neither VB.NET nor C#.

Frans Bouma
Disagree with the down-vote... fixed.
Marc Gravell
Thank you Frans, I have considering creating a view or stored procedure.Thank you very much Marc.
Leandro López
@Leandro - you might also consider a table-valued function (UDF) in place of a stored procedure; the system can get the metadata more accurately, and it is composable at the server. Only works with LINQ-to-SQL, though - not Entity Framework (AFAIK).
Marc Gravell
A: 

This is how you do a distinct count query. Note that you have to filter out the nulls.

var useranswercount = (from a in tpoll_answer
where user_nbr != null && answer_nbr != null
select user_nbr).Distinct().Count();

If you combine this with into your current grouping code, I think you'll have your solution.

GeekyMonkey
Thank you GeekyMonkey. Columns are NOT NULL, so it is one less thing to worry about. If I'm not mistaken, I think your query the total count and not a grouped by count.
Leandro López
+17  A: 

There isn't direct support for COUNT(DISTINCT {x})), but you can simulate it from an IGrouping<,> (i.e. what group by returns); I'm afraid I only "do" C#, so you'll have to translate to VB...

 select new
 {
     Foo= grp.Key,
     Bar= grp.Select(x => x.SomeField).Distinct().Count()
 };

Here's a Northwind example:

    using(var ctx = new DataClasses1DataContext())
    {
        ctx.Log = Console.Out; // log TSQL to console
        var qry = from cust in ctx.Customers
                  where cust.CustomerID != ""
                  group cust by cust.Country
                  into grp
                  select new
                  {
                      Country = grp.Key,
                      Count = grp.Select(x => x.City).Distinct().Count()
                  };

        foreach(var row in qry.OrderBy(x=>x.Country))
        {
            Console.WriteLine("{0}: {1}", row.Country, row.Count);
        }
    }

The TSQL isn't quite what we'd like, but it does the job:

SELECT [t1].[Country], (
    SELECT COUNT(*)
    FROM (
        SELECT DISTINCT [t2].[City]
        FROM [dbo].[Customers] AS [t2]
        WHERE ((([t1].[Country] IS NULL) AND ([t2].[Country] IS NULL)) OR (([t1]
.[Country] IS NOT NULL) AND ([t2].[Country] IS NOT NULL) AND ([t1].[Country] = [
t2].[Country]))) AND ([t2].[CustomerID] <> @p0)
        ) AS [t3]
    ) AS [Count]
FROM (
    SELECT [t0].[Country]
    FROM [dbo].[Customers] AS [t0]
    WHERE [t0].[CustomerID] <> @p0
    GROUP BY [t0].[Country]
    ) AS [t1]
-- @p0: Input NVarChar (Size = 0; Prec = 0; Scale = 0) []
-- Context: SqlProvider(Sql2008) Model: AttributedMetaModel Build: 3.5.30729.1

The results, however, are correct- verifyable by running it manually:

        const string sql = @"
SELECT c.Country, COUNT(DISTINCT c.City) AS [Count]
FROM Customers c
WHERE c.CustomerID != ''
GROUP BY c.Country
ORDER BY c.Country";
        var qry2 = ctx.ExecuteQuery<QueryResult>(sql);
        foreach(var row in qry2)
        {
            Console.WriteLine("{0}: {1}", row.Country, row.Count);
        }

With definition:

class QueryResult
{
    public string Country { get; set; }
    public int Count { get; set; }
}
Marc Gravell
Thank you Marc. Yes, I also considered this approach, in fact a very similar query that returns the correct results. Sadly it is a shame the SQL generated is not optimal, but it works and that´s enough for the time being.
Leandro López
@Leandro - it might be more TSQL, but I'd compare the actual query plans before claiming anything about optimal - it might well be identical to the optimiser.
Marc Gravell
IF the sql is like crap, why don't you write it in plain sql instead in the first place?
marko