views:

718

answers:

3

Sorry, I'm not sure how to phrase that and I'm really not very good with SQL. The db engine i SQL Server Compact. I currently have this query:

SELECT *
FROM Samples
WHERE FunctionId NOT IN
(SELECT CalleeId FROM Callers)
ORDER BY ThreadId, HitCount DESC

Which gives me:

ThreadId   Function  HitCount
       1        164      6945
       1       3817         1
       4       1328      7053

Now, I only want the result with the maximum hit count for each unique value of Thread. In other words, that second row should be dropped. I'm not sure how to pull this off.

[EDIT] If it helps, this is an alternate form of the same query:

SELECT *
FROM Samples s1
LEFT OUTER JOIN Callers c1
    ON s1.ThreadId = c1.ThreadId AND s1.FunctionId = c1.CalleeId
WHERE c1.ThreadId IS NULL
ORDER BY ThreadId

[EDIT] I ended up making schema changes to avoid doing this, as the suggested queries were looking rather expensive. Thanks for all the help.

+2  A: 

Here's how I would do it:

SELECT s1.*
FROM Samples s1
LEFT JOIN Samples s2 
  ON (s1.Thread = s2.Thread and s1.HitCount < s2.HitCount)
WHERE s1.FunctionId NOT IN (SELECT CalleeId FROM Callers) 
  AND s2.Thread IS NULL
ORDER BY s1.ThreadId, s1.HitCount DESC

In other words, the row s1 for which there is no other row s2 matching the same Thread and having a greater HitCount.

Bill Karwin
There's a nasty subtlety here -- the nested subclause (SELECT CalleeId FROM Callers) has to be applied to the other half of the join as well. I restructured the initial query a little bit:SELECT s1.*FROM Samples s1LEFT OUTER JOIN Callers c1 ON s1.ThreadId = c1.ThreadId AND s1.FunctionId = c1.CalleeIdWHERE c1.ThreadId IS NULL
Promit
+2  A: 

Does SQL Server compact support windowed functions?

Alternative 1--Will include all rows that tie. Will not include a row, if the only rows for a given Thread all have null for HitCount:

SELECT Thread, Function, HitCount
FROM (SELECT Thread, Function, HitCount,
        MAX(HitCount) over (PARTITION BY Thread) as MaxHitCount
    FROM Samples
    WHERE FunctionId NOT IN
        (SELECT CalleeId FROM Callers)) t 
WHERE HitCount = MaxHitCount 
ORDER BY ThreadId, HitCount DESC

Alternative 2--Will include all rows that tie. If there is no row for a given thread with non-null HitCount, will return all rows for that thread:

SELECT Thread, Function, HitCount
FROM (SELECT Thread, Function, HitCount,
        RANK() over (PARTITION BY Thread ORDER BY HitCount DESC) as R
    FROM Samples
    WHERE FunctionId NOT IN
        (SELECT CalleeId FROM Callers)) t
WHERE R = 1
ORDER BY ThreadId, HitCount DESC

Alternative 3--Will non-determistically pick one row in case of ties and discard others. Will include a row if all rows for a given thread have null HitCount

SELECT Thread, Function, HitCount
FROM (SELECT Thread, Function, HitCount,
        ROW_NUMBER() over (PARTITION BY Thread ORDER BY HitCount DESC) as R
    FROM Samples
    WHERE FunctionId NOT IN
        (SELECT CalleeId FROM Callers)) t
WHERE R = 1
ORDER BY ThreadId, HitCount DESC

Alternative 4 & 5--Uses older constructs, if the windowed functions aren't available, and says what is meant a little cleaner than using joins. Benchmark if spead is a priority. Both return all rows that participate in a tie. Alternative 4 will HitCount is null when non-null values are not available for HitCount. Alternative 5 will not return rows with HitCount is null.

SELECT *
FROM Samples s1
WHERE FunctionId NOT IN
    (SELECT CalleeId FROM Callers)
AND NOT EXISTS
    (SELECT *
    FROM Samples s2
    WHERE s1.FunctionId = s2.FunctionId
    AND s1.HitCount < s2.HitCount)
ORDER BY ThreadId, HitCount DESC

SELECT *
FROM Samples s1
WHERE FunctionId NOT IN
    (SELECT CalleeId FROM Callers)
AND HitCount = 
    (SELECT MAX(HitCount)
    FROM Samples s2
    WHERE s1.FunctionId = s2.FunctionId)
ORDER BY ThreadId, HitCount DESC
Shannon Severance
+1  A: 

Will work with SQL Server 2005+:

WITH maxHits AS(
  SELECT s.threadid,
         MAX(s.hitcount) 'maxhits'
    FROM SAMPLES s
    JOIN CALLERS c ON c.threadid = s.threadid AND c.calleeid != s.functionid
GROUP BY s.threadid
)
SELECT t.*
  FROM SAMPLES t
  JOIN CALLERS c ON c.threadid = t.threadid AND c.calleeid != t.functionid
  JOIN maxHits mh ON mh.threadid = t.threadid AND mh.maxhits = t.hitcount

Work on any database:

SELECT t.*
  FROM SAMPLES t
  JOIN CALLERS c ON c.threadid = t.threadid AND c.calleeid != t.functionid
  JOIN (SELECT s.threadid,
               MAX(s.hitcount) 'maxhits'
          FROM SAMPLES s
          JOIN CALLERS c ON c.threadid = s.threadid AND c.calleeid != s.functionid
      GROUP BY s.threadid) mh ON mh.threadid = t.threadid AND mh.maxhits = t.hitcount
OMG Ponies
What if (SELECT s.ThreadAID MAX(s.HitCount) FROM SAMPLES s GROUP BY s.ThreadId) returns a max that belongs to only rows where FunctionId is in Callers.CallerId? Those rows are ultimately removed, and the Max(HitCount) for that thread from a row with FuctionId not in Callers.CallerId is left out of the results.
Shannon Severance
Good point. Easily corrected by adding the JOIN to CALLERS in the CTE or inline view - see update.
OMG Ponies
@rexem. Noticed another problem: There is nothing in the OP that leads me to belieave that Callers has a ThreadId column.
Shannon Severance
Yes there is - see the alternate query (second query).
OMG Ponies
@rexem. You are right, I did not notice that.
Shannon Severance