tags:

views:

234

answers:

4

I'm trying to write a report which will join a person, their work, and their hourly wage at the time of work. I cannot seem to figure out the best way to join the person's cost when the date is less than the date of the work.

Let's say a person cost $30 per hour at the start of the year then got a $10 raise o Feb 5 and another on Mar 1.

  • 01/01/2010 $30.00 (per hour)
  • 02/05/2010 $40.00
  • 03/01/2010 $45.00

The person put in hours several days which span the rasies.

  • 01/05/2010 10 hours (should be at $30/hr)
  • 01/27/2010 5 hours (again at $30)
  • 02/10/2010 10 hours (at $40/hr)
  • 03/03/2010 5 hours (at $45/hr)

I'm trying to write one SQL statement which will pull the hours, the cost per hour, and the hours*cost. The cost is the hourly rate last entered into the system so the cost date is less than the work date, ordered by cost date limit 1.

SELECT person.id, person.name, work.hours, person_costs.value, 
       work.hours * person_costs.value AS value
  FROM person
  JOIN work ON person.id = work.person_id
  JOIN person_costs ON person.id = person_costs.person_id 
                   AND person_costs.date < work.date
 WHERE person.id = 1234
ORDER BY work.date ASC

The problem I'm having, the person_costs isn't ordered by date in descending order. It's pulling out "any" value (naturally sorted by record position) which matches the condition. How do I select the first person_cost value which is older than the work date?

Thanks!

A: 

The best solution is to change the person_costs table so that it records the price over a date interval (start and finish) rather the price after a change. As long as you don't create any overlapping intervals, the join will always fetch only one row per person/date.

If you can't change the table structure, you can create a view that achieves the same effect.

Marcelo Cantos
This solution will work the best for me as the query will be faster than a sub query (looping through millions of records nightly). Building the CRUD for the person_cost was more tricky with all of the edge cases, but it was worth it.
Gary
A: 

Here's my test schema:

Create Table person_cost
                    (
                    PersonId int not null
                    , Rate decimal(15,4) not null
                    , [date] datetime not null
                    , Constraint PK_person_cost Primary Key Clustered ( PersonId, [date] )
                    )
Insert person_cost(PersonId, Rate, [date]) Values(1,30,'2010-01-01')
Insert person_cost(PersonId, Rate, [date]) Values(1,40,'2010-02-05')
Insert person_cost(PersonId, Rate, [date]) Values(1,50,'2010-03-01')

Create Table Work
                (
                PersonId int not null
                , [Date] datetime not null
                , Hours int not null
                )

Insert Work(PersonId, [Date], Hours) Values(1, '2010-01-05', 10)
Insert Work(PersonId, [Date], Hours) Values(1, '2010-01-27', 5)
Insert Work(PersonId, [Date], Hours) Values(1, '2010-02-10', 10)
Insert Work(PersonId, [Date], Hours) Values(1, '2010-03-03', 5)

And my query:

Select Work.PersonId, Work.Hours, PayRanges.Rate
    , Work.Hours * PayRanges.Rate As Value
From Work
    Join    (
            Select pc1.PersonId, pc1.[date] As StartDate, Rate
                , Coalesce((Select Min(pc2.[date])
                    From person_cost As pc2
                    Where pc2.personId = pc1.personId 
                        And pc2.[date] > pc1.[date])
                    , '9999-12-31') As NextEffectiveDate
            From person_cost As pc1
            ) As PayRanges
        On Work.PersonId = PayRanges.PersonId
            And Work.[Date] >= PayRanges.StartDate
            And Work.[Date] < PayRanges.NextEffectiveDate

In essence, I calculated the end date of the effective pay rate. In my example, I excluded a join to the Person table, however that can easily be included to get Person data. Note that my calculation of the expiration date of a pay rate will be the day the next pay rate takes effect and thus I query for value strictly less than the expiration date.

Thomas
A: 

find most recent cost date first:

select w.person_id, w.date as work_date, max(c.date) as most_recent_cost_date
from work w
join person_costs c on using(person_id)
where c.date < w.date
group by w.person_id, w.date

then plug it in on your existing query:

SELECT person.id, person.name, work.hours, person_costs.value, 
   work.hours * person_costs.value AS value
FROM person
JOIN work ON person.id = work.person_id
JOIN person_costs ON person.id = person_costs.person_id
       -- remove this: AND person_costs.date < work.date
WHERE person.id in (1234, 1235) -- future-proof, will still work on multiple person report
     -- then plug it here
     AND (person.id, work.date, person_costs.date) IN 
     (
         select w.person_id, w.date as work_date, max(c.date) as most_recent_cost_date
         from work w
         join person_costs c using(person_id)
         where c.date < w.date
         group by w.person_id, w.date
     )

ORDER BY person.id, work.date ASC

i'm not sure if IN tuples (i.e. fielda,fieldb IN (fielda,fieldb)) works on MySQL, i don't have MySQL in my box now. if the above doesn't work, use this:

SELECT person.id, person.name, work.hours, person_costs.value, 
   work.hours * person_costs.value AS value
FROM person
JOIN work ON person.id = work.person_id
JOIN person_costs ON person.id = person_costs.person_id
       -- remove this: AND person_costs.date < work.date
WHERE person.id in (1234, 1235) -- future-proof, will still work on multiple person report
     -- then plug it here
     AND EXISTS
     (
         select w.person_id, w.date as work_date, max(c.date) as most_recent_cost_date
         from work w
         join person_costs c using(person_id)
         where               
             w.person_id = person.id   
             and w.date = work.date

             and c.date < w.date

         group by w.person_id, w.date

         having person_costs.date = max(c.date) 
     )

ORDER BY person.id, work.date ASC
Michael Buen
A: 

or use this:

SELECT person.id, person.name, work.hours, person_costs.value, 
   work.hours * person_costs.value AS value
FROM person
JOIN work ON person.id = work.person_id
JOIN person_costs ON person.id = person_costs.person_id
       -- remove this: AND person_costs.date < work.date
WHERE person.id in (1234, 1235) -- future-proof, will still work on multiple person report
     -- then plug it here
     and person_costs.date = 
     (
         select max(c.date) as most_recent_cost_date
         from work w
         join person_costs c using(person_id)
         where               
             w.person_id = person.id   
             and w.date = work.date

             and c.date < w.date        
     )

     ORDER BY person.id, work.date ASC
Michael Buen