tags:

views:

98

answers:

2

I'm trying to convert a Linq query to SQL. My Linq query looks like this:

from s in Somethings
where s.CreatedTime >= new DateTime(2010, 01, 01)
where s.CreatedTime <  new DateTime(2010, 02, 01)
group s by s.Data into grouping
select grouping.OrderByDescending(s => s.CreatedTime)
               .ThenByDescending( s => s.UpdatedTime)
               .First();

In words, that is supposed to get all things from a certain month. Then group them by a specific key. For each key, I want the most recently created element. If two elements with the same key were created at the same time, I want to break ties by most recently updated.

So far I have this for SQL

SELECT s1.*
FROM Somethings s1
JOIN (
  SELECT s.Date AS Data, MAX(CreatedTime) AS CreatedTime
  FROM Somethings s
  WHERE s.CreatedTime >= '20100101' 
    AND s.CreatedTime <  '20100201' 
  GROUP BY s.Data
) s2 ON s1.Data = s2.Data
    AND s1.CreatedTime = s2.CreatedTime

That works, but I can't control how ties are broken.

What I really want is a way to arbitrarily sort each grouping like I can in Linq. I want to define my own aggregation function that takes a set of rows, and returns one row. Is this possible in SQL, or is Linq more expressive? SQL's aggregation functions MAX, MIN, COUNT etc don't seem to be first class functions like their equivalent in Linq is. Of course it could just be my lack of knowledge of SQL.

Here's a made up example to further illustrate what I want to do in SQL:

SELECT (SELECT * 
        FROM grouping 
        ORDER BY CreatedTime DESC, UpdatedTime DESC
        LIMIT 1)
FROM Somethings s
WHERE s.CreatedTime >= '20100101' 
  AND s.CreatedTime <  '20100201' 
GROUP BY s.Data AS grouping

In this example, my illegal inner query is serving the same role as an aggregation function.

A: 

Using Group By in sql flattens the records in the group. This allows you to perform aggregate functions on the group, returning information about the group (min, max, count, etc.), but the individual records in each group are not accessible.

Maybe this doesn't answer your question...

Add one more level, to group again, with max updated time... That should solve your problem.

SELECT sFinal.* FROM Somethings sFinal
 JOIN 
   (
     SELECT s1.Query, MAX(UpdatedTime) AS UpdatedTime
     FROM Somethings s1
     JOIN (
          SELECT s.Query AS Query, MAX(CreatedTime) AS CreatedTime
          FROM Somethings s
          WHERE s.CreatedTime >= '20100101' 
          AND s.CreatedTime <  '20100201' 
          GROUP BY s.Query
          ) s2
     ON s1.Query = s2.Query 
     AND s1.CreatedTime = s2.CreatedTime
     GROUP BY s1.Query
   ) s3
ON sFinal.Query = s3.Query AND sFinal.UpdatedTime = s3.UpdatedTime

Now, if they have matching CreatedTime and UpdatedTime, this will return more than one record. but it should be what you were looking for.

Nate Heinrich
+2  A: 

This isn't really an aggregation, it's just a groupwise maximum. ROW_NUMBER is the easiest way to write these queries:

;WITH CTE AS
(
    SELECT
        Query, CreatedTime, UpdatedTime, <other_columns>,
        ROW_NUMBER() OVER
        (
            PARTITION BY Query
            ORDER BY CreatedTime DESC, UpdatedTime DESC
        ) AS RowNum
    FROM Somethings
    WHERE CreatedTime >= '20100101'
    AND CreatedTime < '20100201'
)
SELECT *
FROM CTE
WHERE RowNum = 1

It's not necessarily the most efficient, but it's reasonably good in most cases. And the nice thing is that you can modify this to do top 2 per group, top 3, etc., and you have complete control over ties.

(P.S. I hope you don't actually name the column "Query")

Aaronaught
Sorry, query was domain specific stuff leaking into my domainless question. I've edited it out. I'll investigate this row_number thing now :)
dmnd