views:

643

answers:

4

The objective is below the list of tables.

Tables:

Table: Job

  • JobID
  • CustomerID
  • Value
  • Year

Table: Customer

  • CustomerID
  • CustName

Table: Invoice

  • SaleAmount
  • CustomerID

The Objective

Part 1: (easy) I need to select all invoice records and sort by Customer (To place nice w/ Crystal Reports)

Select * from Invoice as A inner join Customer as B on A.CustomerID = B.CustomerID

Part 2: (hard) Now, we need to add two fields:

  • JobID associated with that customer's job that has the Maximum Value (from 2008)
  • Value associated with that job

Pseudo Code

Select * from 
Invoice as A
inner join Customer as B on A.CustomerID = B.CustomerID
inner join 
(select JobID, Value from Jobs where Job:JobID has the highest value out of all of THIS customer's jobs from 2008)

General Thoughts

  • This is fairly easy to do If I am only dealing with one specific customer:

    select max(JobId), max(Value) as MaxJobID from Jobs where Value = (select max(Value) from Jobs where CustomerID = @SpecificCustID and Year = '2008') and CustomerID = SpecificCustID and CustomerID = '2008'

This subquery determines the max Value for this customer in 2008, and then its a matter of choosing a single job (can't have dupes) out of potential multiple jobs from 2008 for that customer that have the same value.

The Difficulty

What happens when we don't have a specific customer ID to compare against? If my goal is to select ALL invoice records and sort by customer, then this subquery needs access to which customer it is currently dealing with. I suppose this can "sort of" be done through the ON clause of the JOIN, but that doesn't really seem to work because the sub-sub query has no access to that.

I'm clearly over my head. Any thoughts?

+1  A: 

The over function is an awesome, but often neglected function. You can use it in a subquery to pull back your valid jobs, like so:

select
    a.*
from
    invoice a
    inner join customer b on
        a.customerid = b.customerid
    inner join (select customerid, max(jobid) as jobid, maxVal from
               (select customerid, 
                jobid,
                value,
                max(value) over (partition by customerid) as maxVal
                from jobs
                where Year = '2008') s
               where s.value = s.maxVal
               group by customerid, maxVal) c on
        b.customerid = c.customerid
        and a.jobid = c.jobid

Essentially, that first inner query looks like this:

select 
    customerid, 
    jobid,
    value,
    max(value) over (partition by customerid) as maxVal
from jobs
where Year = '2008'

You'll see that this pulls back all of the jobs, but with that additional column which lets you know what the maximum value is for each customer. With the next subquery, we filter out any rows that have value and maxVal equal. Additionally, it finds the max JobID based on customerid and maxVal, because we need to pull back one and only one JobID (as per the requirements).

Now, you have a complete listing of CustomerID and JobID that meet the conditions of having the highest JobID that contains the maximum Value for that CustomerID in a given year. All that's left is to join it to Invoice and Customer, and you're good to go.

Eric
Eric, Thanks for the detailed input. I'm playing around with the code now.
hamlin11
Eric, Any advice on which of the 3 correct answers are best? We've got the "Over" solution w/ CTE, "Over" solution w/ out CTE, and your solution. All of them work.
hamlin11
@hamlin: Benchmark them. I don't know which one is faster off the top of my head. Which ever one is fastest, wins :)
Eric
Eric. I agree that that would be a fair solution... but I'm not experienced enough with the profiler to do it in under an hour and too short on time for custom benchmarks. Thanks for your great answer.
hamlin11
Bah, you don't need to worry about the profiler. Just run the queries. Fastest one back wins. You can also look at the execution plans to see total cost (Ctrl+L in SSMS's query editor).
Eric
+2  A: 

How about using a CTE. Obviously, I can't test, but here is the idea. You need to replace col1, col2, ..., coln with the stuff you want to select.

Inv( col1, col2, ... coln)
AS
(
  SELECT col1, col2, ... coln,
     ROW_NUMBER() OVER (PARTITION BY A.CustomerID 
     ORDER BY A.Value DESC) AS [RowNumber]
      FROM Invoice A INNER JOIN Customer B ON A.CustomerID = B.CustomerID
      WHERE A.CustomerID = @CustomerID
      AND A.Year = @Year
)
SELECT * FROM Inv WHERE RowNumber = 1

If you don't have a CustomerID, this will return the top value for each customer (that will hurt on performance tho).

JP Alioto
There's no need to add a CTE. You can join to the ROW_NUMBER() call directly.
Sean Reilly
@Sean: I think the CTE version will be more efficient in the case of having no CustomerID, but I would have to see the execution plan to prove that. :)
JP Alioto
@JP I need advice... I got 3 very detailed correct answers. Which do I pick? If I had to pick one, I'd pick Seans because it is the simplest... but I really have no clue which is the most efficient.
hamlin11
@hamlin: That's easy, run the profiler :) I agree in principle that the sub-query is simpler than the CTE.
JP Alioto
Well... I'm quite the novice (very limited experience with the profiler). Thanks for the advice.
hamlin11
Here's a good primer on reading the execution plan ... http://www.sql-server-performance.com/tips/query_execution_plan_analysis_p1.aspx
JP Alioto
@JP there should be no performance advantage (or disadvantage) to using a non-recursive CTE. The optimizer will handle either case the same way. I suspect they'd actually compile to the same query plan.
Sean Reilly
@JP I had a similar case and used CTE. I think CTE is cool; I never had a performance issue. @Sean Reilly - your solution might solve the problem I've been having! I'll see if it can help me - thanks!@hamlin - here's a free ebook on Execution Plan - http://dbalink.wordpress.com/2008/08/08/dissecting-sql-server-execution-plans-free-ebook/
MarlonRibunal
+1  A: 

The row_number() function can give you what you need:

Select A.*, B.*, C.JobID, C.Value
from 
Invoice as A
inner join Customer as B on A.CustomerID = B.CustomerID
inner join (
   select JobID, Value, CustomerID,
   ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY Value DESC) AS Ordinal
   from Jobs
   WHERE Year = 2008
) AS C ON (A.CustomerID = C.customerID AND C.Ordinal = 1)

The ROW_NUMBER() function in this query will order by value in descending order and the PARTITION BY clause will do this separately for each different value of CustomerID. This means that the highest Value for each customer will always be 1, so we can join to that value.

Sean Reilly
Sean, I'm going with your answer because it's the simplest. As far as run-time... I have no idea. They are all great solutions. Thanks for your advice.
hamlin11
To find runtime performance: paste each potential solution into SSMS, and hit ctrl-L. This shows the estimated execution plan. The field "subtree cost" is the estimated execution time in seconds on "reference hardware" (some box at microsoft hq). Smaller subtree cost generally equals faster. Without measuring, I'd guess that the runtime cost of all three solutions would be similar.
Sean Reilly
+1  A: 

Just to be complete with the non row_number solution for those < MSSQL 2005. Personanly, I find it easier to follow myslef...but that could be biased considering how much time I spend in MSSQL 2000 vs 2005+.

SELECT * 
FROM Invoice as A
INNER JOIN Customer as B ON
    A.CustomerID = B.CustomerID
INNER JOIN (
    SELECT
       CustomerId,
       --MAX in case dupe Values. 
       ==If UC on CustomerId, Value (or CustomerId, Year, Value) then not needed
       MAX(JobId) as JobId 
    FROM Jobs
    JOIN (
        SELECT
            CustomerId,
            MAX(Value) as MaxValue
        FROM Jobs
        WHERE Year = 2008
        GROUP BY
            CustomerId
    ) as MaxValue ON
         Jobs.CustomerId = MaxValue.CustomerId
         AND Jobs.Value = MaxValue.MaxValue
    WHERE Year = 2008
    GROUP BY
         CustomerId
) as C ON
     B.CustomerID = C.CustomerID
Mark Brackett