views:

121

answers:

2

I am writing a simple data warehouse that will allow me to query the table to observe periodic (say weekly) changes in data, as well as changes in the change of the data (e.g. week to week change in the weekly sale amount).

For the purposes of simplicity, I will present very simplified (almost trivialized) versions of the tables I am using here. The sales data table is a view and has the following structure:

CREATE TABLE sales_data (
     sales_time date NOT NULL,
     sales_amt double NOT NULL
)

For the purpose of this question. I have left out other fields you would expect to see - like product_id, sales_person_id etc, etc, as they have no direct relevance to this question. AFAICT, the only fields that will be used in the query are the sales_time and the sales_amt fields (unless I am mistaken).

I also have a date dimension table with the following structure:

CREATE TABLE date_dimension (
  id integer  NOT NULL,
  datestamp   date NOT NULL,
  day_part    integer NOT NULL,
  week_part   integer NOT NULL,
  month_part  integer NOT NULL,
  qtr_part    integer NOT NULL, 
  year_part   integer NOT NULL, 
);

which partition dates into reporting ranges.

I need to write queries that will allow me to do the following:

  1. Return the change in week on week sales_amt for a specified period. For example the change between sales today and sales N days ago - where N is a positive integer (N == 7 in this case).

  2. Return the change in change of sales_amt for a specified period. For in (1). we calculated the week on week change. Now we want to know how that change is differs from the (week on week) change calculated last week.

I am stuck however at this point, as SQL is my weakest skill. I would be grateful if an SQL master can explain how I can write these queries in a DB agnostic way (i.e. using ANSI SQL).

+2  A: 

I suggest you build a separate dimension table for 'time' (one day per row, that contains information about repeating time periods (day, week, month, quarter) so you can easily join/select for that type of information.

Your queries for (1.) and (2.) could be built that way.

Yes, most SQL dialects allow infering that information with time/date function .. but they are slow (-er) and more complicated than using a dimension table ....

lexu
@lexu: thanks for the comment. Could you elaborate some more? - perhaps with an example. I am not sure I understand exactly what you mean, or indeed, how to implement it.
morpheous
@morpheous: You are asking two questions (or maybe I expanded it to two): (1) what DB/data design to use (2) how to query the data. Since your sales_data table is very limited (no userID, contractID, ProductID), I assume you are at the brainstorming stage of design?
lexu
@lexu: +1 for suggesting the date dimension table. I have updated my question in light of your comments.
morpheous
+2  A: 

As noted in the comment above, I probably do not understand your model -- so here is a simple one to get started.

alt text

Now if I want weekly sales for calendar year of 2010

select 
    CalendarYearWeek
  , sum(SalesAmount)
from factSales as f
join dimDate as d on d.DateKey = f.DateKey
where Year = 2010
group by CalendarYearWeek

CalendarYearWeek is a column in dimDate, varchar(8), for example '2010-w03', Year is an integer column in dimDate too.

Not sure if this is close to what you were looking for, but may be a start.

EDIT

dimDate also has these columns:

WeekNumberInEpoch, integer -- increases increases starting from some epoch date in past. All rows in dimDate within the same week have the same WeekNumberInEpoch.

DayOfWeek, varchar(10) -- 'sunday', 'monday', ...

DayNumberInWeek, integer -- 1-7

This uses CTEs, should work with latest PostgreSQL, SQL Server, Oracle, DB2. For others you may package the CTE (q_00) into a sub-query.

-- for week to previous week
with
q_00 as (
    select
        WeekNumberInEpoch
      , sum(SalesAmount) as Amount
    from factSale as f
    join dimDate  as d on d.DateKey = f.DateKey
    where CalendarYear = 2010
    group by WeekNumberInEpoch
)
select
    a.WeekNumberInEpoch
  , a.Amount as ThisWeekSales
  , b.Amount as LastWeekSales
  , a.Amount - b.Amount as Difference
from q_00 as a
join q_00 as b on b.WeekNumberInEpoch = a.WeekNumberInEpoch - 1
order by a.WeekNumberInEpoch desc ;


-- for day of week to day of previous week 
-- monday to monday, tuesday to tuesday, ...
with
q_00 as (
    select
        WeekNumberInEpoch
      , DayOfWeek  
      , sum(SalesAmount) as Amount
    from factSale as f
    join dimDate  as d on d.DateKey = f.DateKey
    where CalendarYear = 2010
    group by WeekNumberInEpoch, DayOfWeek
)
select
    a.WeekNumberInEpoch
  , a.DayOfWeek  
  , a.Amount as ThisWeekSales
  , b.Amount as LastWeekSales
  , a.Amount - b.Amount as Difference
from q_00 as a
join q_00 as b on (b.WeekNumberInEpoch = a.WeekNumberInEpoch - 1
                   and b.DayOfWeek = a.DayOfWeek)
order by a.WeekNumberInEpoch desc, a.DayOfWeek ;



-- Sliding by day and day difference (= 7)
with
q_00 as (
    select
        DayNumberInEpoch
      , FullDate
      , DayOfWeek
      , sum(SalesAmount) as Amount
    from factSale as f
    join dimDate as d on d.DateKey = f.DateKey
    where CalendarYear = 2010
    group by DayNumberInEpoch, FullDate, DayOfWeek
)
select
    a.FullDate  as ThisDay
  , a.DayOfWeek as ThisDayName
  , a.Amount    as ThisDaySales
  , b.FullDate  as PreviousPeriodDay
  , b.DayOfWeek as PreviousDayName
  , b.Amount    as PreviousPeriodDaySales
  , a.Amount - b.Amount as Difference
from q_00 as a
join q_00 as b on b.DayNumberInEpoch = a.DayNumberInEpoch - 7
order by a.FullDate desc ;
Damir Sudarevic
@damir: +1 for the beautiful ERD. I dont know how you did it. I am guessing that I may not have stated my problem correctly. However, the ERD you provided is almost an exact fit of my db - and I notice that the only tables used in the query were the fact table and the dimDate table - thats why they were all I included in my question. The query you provided is definitely on the right path - at least, it shows how to fetch data from the fact table, based on the date dimension table. The next (and final) step is how to query the database for say weekly or monthly changes in the sales amount.
morpheous
@damir: Lastly, its the week on week change that I want. The query snippet you kindly provided, shows the cumulative (i.e. sum) over the period. What I want to calculate (for example) is the change between sales today and sales N days ago - where N is a positive integer (N == 7 in this case). I hope that clarifies what I'm trying to do. mtia
morpheous
@damir: this seems to be exactly what I want to be doing (judging by your comments). I just need to read (and reread the SQL to totally understand it - for example, I'm not sure why q_00 is using a sum). Once I have tested it on my db and it works, I will accept this as the final answer - if not, I will come back with some questions. thanks
morpheous
@morpheous because one row in factSales is one chocolate, one item on a receipt -- there are many of these for a single day.
Damir Sudarevic
@damir: oh, I see. ok in that case I probably should have specified that I was only interested in the sales of a specific product (specified in the query). For example, the query to return week on week sales for chocolate bars. So it would seem that we dont need the sum() aggregate. Hope that clarifies what I'm trying to do further. Thanks for your help and feedback.
morpheous
+1 From me for explaining the aggregation concepts needed for the data mining he needs to do! Chapeau!
lexu