ansaurus

Question

SQL Server 2005 Syntax Help - "Select Info based upon Max Value of Sub Query"

Answer 1

+1 A:

The over function is an awesome, but often neglected function. You can use it in a subquery to pull back your valid jobs, like so:

select
    a.*
from
    invoice a
    inner join customer b on
        a.customerid = b.customerid
    inner join (select customerid, max(jobid) as jobid, maxVal from
               (select customerid, 
                jobid,
                value,
                max(value) over (partition by customerid) as maxVal
                from jobs
                where Year = '2008') s
               where s.value = s.maxVal
               group by customerid, maxVal) c on
        b.customerid = c.customerid
        and a.jobid = c.jobid

Essentially, that first inner query looks like this:

select 
    customerid, 
    jobid,
    value,
    max(value) over (partition by customerid) as maxVal
from jobs
where Year = '2008'

You'll see that this pulls back all of the jobs, but with that additional column which lets you know what the maximum value is for each customer. With the next subquery, we filter out any rows that have value and maxVal equal. Additionally, it finds the max JobID based on customerid and maxVal, because we need to pull back one and only one JobID (as per the requirements).

Now, you have a complete listing of CustomerID and JobID that meet the conditions of having the highest JobID that contains the maximum Value for that CustomerID in a given year. All that's left is to join it to Invoice and Customer, and you're good to go.

Eric 2009-08-11 04:49:10

Eric, Thanks for the detailed input. I'm playing around with the code now.

hamlin11 2009-08-11 05:12:01

Eric, Any advice on which of the 3 correct answers are best? We've got the "Over" solution w/ CTE, "Over" solution w/ out CTE, and your solution. All of them work.

hamlin11 2009-08-11 05:32:39

@hamlin: Benchmark them. I don't know which one is faster off the top of my head. Which ever one is fastest, wins :)

Eric 2009-08-11 05:34:20

Eric. I agree that that would be a fair solution... but I'm not experienced enough with the profiler to do it in under an hour and too short on time for custom benchmarks. Thanks for your great answer.

hamlin11 2009-08-11 06:05:48

Bah, you don't need to worry about the profiler. Just run the queries. Fastest one back wins. You can also look at the execution plans to see total cost (Ctrl+L in SSMS's query editor).

Eric 2009-08-11 06:18:19

Answer 2

+2 A:

How about using a CTE. Obviously, I can't test, but here is the idea. You need to replace col1, col2, ..., coln with the stuff you want to select.

Inv( col1, col2, ... coln)
AS
(
  SELECT col1, col2, ... coln,
     ROW_NUMBER() OVER (PARTITION BY A.CustomerID 
     ORDER BY A.Value DESC) AS [RowNumber]
      FROM Invoice A INNER JOIN Customer B ON A.CustomerID = B.CustomerID
      WHERE A.CustomerID = @CustomerID
      AND A.Year = @Year
)
SELECT * FROM Inv WHERE RowNumber = 1

If you don't have a CustomerID, this will return the top value for each customer (that will hurt on performance tho).

JP Alioto 2009-08-11 04:50:29

There's no need to add a CTE. You can join to the ROW_NUMBER() call directly.

Sean Reilly 2009-08-11 04:55:04

@Sean: I think the CTE version will be more efficient in the case of having no CustomerID, but I would have to see the execution plan to prove that. :)

JP Alioto 2009-08-11 05:15:40

@JP I need advice... I got 3 very detailed correct answers. Which do I pick? If I had to pick one, I'd pick Seans because it is the simplest... but I really have no clue which is the most efficient.

hamlin11 2009-08-11 05:31:42

@hamlin: That's easy, run the profiler :) I agree in principle that the sub-query is simpler than the CTE.

JP Alioto 2009-08-11 05:53:50

Well... I'm quite the novice (very limited experience with the profiler). Thanks for the advice.

hamlin11 2009-08-11 06:04:46

Here's a good primer on reading the execution plan ... http://www.sql-server-performance.com/tips/query_execution_plan_analysis_p1.aspx

JP Alioto 2009-08-11 06:12:21

@JP there should be no performance advantage (or disadvantage) to using a non-recursive CTE. The optimizer will handle either case the same way. I suspect they'd actually compile to the same query plan.

Sean Reilly 2009-08-11 13:13:50

@JP I had a similar case and used CTE. I think CTE is cool; I never had a performance issue. @Sean Reilly - your solution might solve the problem I've been having! I'll see if it can help me - thanks!@hamlin - here's a free ebook on Execution Plan - http://dbalink.wordpress.com/2008/08/08/dissecting-sql-server-execution-plans-free-ebook/

MarlonRibunal 2009-08-11 23:47:49

Answer 3

+1 A:

The row_number() function can give you what you need:

Select A.*, B.*, C.JobID, C.Value
from 
Invoice as A
inner join Customer as B on A.CustomerID = B.CustomerID
inner join (
   select JobID, Value, CustomerID,
   ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY Value DESC) AS Ordinal
   from Jobs
   WHERE Year = 2008
) AS C ON (A.CustomerID = C.customerID AND C.Ordinal = 1)

The ROW_NUMBER() function in this query will order by value in descending order and the PARTITION BY clause will do this separately for each different value of CustomerID. This means that the highest Value for each customer will always be 1, so we can join to that value.

Sean Reilly 2009-08-11 04:53:29

Sean, I'm going with your answer because it's the simplest. As far as run-time... I have no idea. They are all great solutions. Thanks for your advice.

hamlin11 2009-08-11 06:07:02

To find runtime performance: paste each potential solution into SSMS, and hit ctrl-L. This shows the estimated execution plan. The field "subtree cost" is the estimated execution time in seconds on "reference hardware" (some box at microsoft hq). Smaller subtree cost generally equals faster. Without measuring, I'd guess that the runtime cost of all three solutions would be similar.

Sean Reilly 2009-08-11 13:11:12

Answer 4

+1 A:

Just to be complete with the non row_number solution for those < MSSQL 2005. Personanly, I find it easier to follow myslef...but that could be biased considering how much time I spend in MSSQL 2000 vs 2005+.

SELECT * 
FROM Invoice as A
INNER JOIN Customer as B ON
    A.CustomerID = B.CustomerID
INNER JOIN (
    SELECT
       CustomerId,
       --MAX in case dupe Values. 
       ==If UC on CustomerId, Value (or CustomerId, Year, Value) then not needed
       MAX(JobId) as JobId 
    FROM Jobs
    JOIN (
        SELECT
            CustomerId,
            MAX(Value) as MaxValue
        FROM Jobs
        WHERE Year = 2008
        GROUP BY
            CustomerId
    ) as MaxValue ON
         Jobs.CustomerId = MaxValue.CustomerId
         AND Jobs.Value = MaxValue.MaxValue
    WHERE Year = 2008
    GROUP BY
         CustomerId
) as C ON
     B.CustomerID = C.CustomerID

Mark Brackett 2009-08-11 23:36:38

ansaurus

tags:

views:

answers:

SQL Server 2005 Syntax Help - "Select Info based upon Max Value of Sub Query"

related questions