views:

701

answers:

2

Related to a question I asked earlier here, I've found a problem which is eluding me (obviously).

The original question was how to select a min and max date from a daily table based on a monthly table where some daily table dates could be missing. Basically what I needed was columns containing the month date (always the first), the earliest date for that month in the daily table and the latest date for that month in the daily table.

So, if the last week of January and first week of February were missing from the daily table (and we otherwise had all the dates for January and February but no more), I needed:

MonthStart  DayFirst    DayLast
----------  ----------  ----------
2009-01-01  2009-01-01  2009-01-24
2009-02-01  2009-02-08  2009-02-28

The answer was:

select
    m.date as m1,
    min(d.date) as m2,
    max(d.date) as m3
from monthly m
join daily d
    on month(d.date) = month(m.date)
    and year(d.date) = year(m.date)
group by m.date
order by m.date

which worked for the specs I gave.

Unfortunately, reality bites, and there are multiple records in the monthly table (and daily table) with the same date. Specifically:

  • the dates are 2007-10-16 thru 2007-10-30 (15 days), 2007-11-01 thru 2007-11-30 (30 days) and 2007-12-01 thru 2007-12-15 (15 days).
  • each date has six rows in both tables (because they each have a row for three system names and two periods.

The problem is that I sum() a field in the monthly table and the new query is getting values that are much too large (compared to the previous query which did not have the join).

The aggregation changes the query to be:

select
    m.date as m1,
    sum(m.other_field),  -- added this
    min(d.date) as m2,
    max(d.date) as m3
from monthly m
join daily d
    on month(d.date) = month(m.date)
    and year(d.date) = year(m.date)
group by m.date
order by m.date

I think the values are too high due to cross-joining going on since the figures for each month are out by a constant factor, depending on the number of days in the daily table for that month.

My question is this: how do I aggregate the field in the monthly table without that factor coming into play and still get the min/max dates from the daily table for that month?

+1  A: 

You can group the months in a subquery:

select
    m.mindate as m1,
    m.sum_other_field,
    min(d.date) as m2,
    max(d.date) as m3
from (
    select 
         month(date) as month,
         year(date) as year,
         sum(other_field) sum_other_field,
         min(date) mindate
    from monthly
    group by month(date), year(date)
) m
join daily d
    on month(d.date) = m.month
    and year(d.date) = m.year
group by m.month
order by m.year
Andomar
+1  A: 

If the monthly table contains a single entry for each month, you can do simply this:

select
    m.date as m1,
    m.other_field,
    min(d.date) as m2,
    max(d.date) as m3
from monthly m
join daily d
    on month(d.date) = month(m.date)
    and year(d.date) = year(m.date)
group by m.date, m.other_field
order by m.date

otherwise:

select m1, sum(other_field), m2, m3
from (
        select
        m.date as m1,
        m.other_field,
        min(d.date) as m2,
        max(d.date) as m3
    from monthly m
    join daily d
        on month(d.date) = month(m.date)
        and year(d.date) = year(m.date)
    group by m.date, m.other_field) A
group by A.m1, A.m2, A.m3
order by A.m1

Update from pax: Try as I might, I could not get the join solutions working properly - they all seemed to return the same wrong data as the original. In the end, I opted for a non-join solution since it worked and performance wasn't a big issue, since the tables typically have 24 rows (for monthly) and 700 rows (for daily). I'm editing this answer and accepting it since (1) it actually helped a great deal in getting the correct solution for me; and (2) I'm loathe to write my own answer and claim the glory for myself.

Thanks for all your help. The following is what worked for me:

select
    m.date as p1,
    m.grouping_field as p2,
    sum(m.aggregating_field) as p3,
    (select min(date) from daily
        where month(date) = month(m.date)
        and year(date) = year(m.date)) as p4,
    (select max(date) from daily
        where month(date) = month(m.date)
        and year(date) = year(m.date)) as p5
from
    monthly m
group by
    m.date, m.grouping_field

which gave me what I wanted:

    P1       P2    P3       P4         P5
----------  ----  ----  ----------  ----------
2007-10-01  BoxA  12.3  2007-10-16  2007-10-30
2007-10-01  BoxB  13.6  2007-10-16  2007-10-30
2007-10-01  BoxC   7.4  2007-10-16  2007-10-30
2007-11-01  BoxA  20.3  2007-11-01  2007-11-30
2007-11-01  BoxB  24.2  2007-11-01  2007-11-30
2007-11-01  BoxC  21.7  2007-11-01  2007-11-30
2007-12-01  BoxA   6.9  2007-12-01  2007-12-15
2007-12-01  BoxB   6.4  2007-12-01  2007-12-15
2007-12-01  BoxC   6.9  2007-12-01  2007-12-15
najmeddine