tags:

views:

68

answers:

1

Here's an obfuscated version of something I've been trying to do at work. Say I have been given this month's data for customers in my shop - how much they've spent split by the food type:

CUSTOMER    FOOD_TYPE      FOOD_TYPE_VALUE
1           SWEET          52.6
1           SAVOURY        31.0
1           DAIRY          45.8
1           DRINKS         12.1
2           SWEET          15.1
2           SAVOURY        44.1
2           DRINKS         23.4
3           SWEET          95.7
3           SAVOURY        20.0
3           DAIRY          10.8
3           DRINKS         57.1

It has been decided that Customer 3 is our ideal customer as he fits the demographic profile, and we want to track month by month, how everybody's distribution of shopping preferences differs from his.

I can get the percentage allocation for each food type for each customer with:

SELECT
  c1.customer,
  c1.food_type,
  100 * c1.food_type_value / sum(c2.food_type_value)
FROM
  mytable c1 INNER JOIN mytable c2
  ON c1.customer = c2.customer
group by c1.customer, c1.food_type, c1.food_type_value

But I am having trouble constructing a query that will give me a further column with the matching percentage values for my ideal customer. ie:

CUSTOMER    FOOD_TYPE      FOOD_TYPE_PERC  IDEAL_PERC
1           SWEET          37              52
1           SAVOURY        22              11
1           DAIRY          32              6
1           DRINKS         9               31

Any tips on how I can achieve this without too much mess?

+2  A: 

Join it on the subset of the customer table that contains your ideal customer:

SELECT
  c1.customer,
  c1.food_type,
  100 * c1.food_type_value / sum(c2.food_type_value),
  c3.FOOD_TYPE_VALUE / sum(c2.food_type_value) as IDEAL_PERC
FROM
  mytable c1 INNER JOIN mytable c2
  ON c1.customer = c2.customer
  INNER JOIN (
    SELECT FOOD_TYPE, FOOD_TYPE_VALUE
    FROM mytable
    WHERE customer = 3) c3
  ON c2.FOOD_TYPE = c3.FOOD_TYPE
group by c1.customer, c1.food_type, c1.food_type_value, c3.FOOD_TYPE_VALUE

Your comment to my answer suggests you need to divide by the FOOD_TYPE_VALUE of the ideal customer, so do this:

SELECT
  c1.customer,
  c1.food_type,
  100 * c1.food_type_value / sum(c2.food_type_value),
  c3.IDEAL_PERC
FROM
  mytable c1 INNER JOIN mytable c2
  ON c1.customer = c2.customer
  INNER JOIN (
    SELECT s1.FOOD_TYPE,
           100 * s1.FOOD_TYPE_VALUE / sum(s2.FOOD_TYPE_VALUE) IDEAL_PERC
    FROM mytable s1
    INNER JOIN mytable s2
    on s1.customer = s2.customer
    WHERE s1.customer = 3
    GROUP BY s1.FOOD_TYPE, s1.FOOD_TYPE_VALUE) c3
  ON c2.FOOD_TYPE = c3.FOOD_TYPE
group by c1.customer, c1.food_type, c1.food_type_value, c3.IDEAL_PERC
Welbog
thanks - this looks approximately right, but we want to divide c3.FOOD_TYPE_VALUE by the sum of the values in c3, so we need another join, right?
nearly_lunchtime
Yeah, you would need to join again, but you can keep it in the subquery instead of the main query to keep the number of records small.
Welbog
If your ideal customer is static, you might want to dedicate a table to the ideal customer and keep the ideal_perc as a static value instead of recalculating it every time.
Welbog
Nice, works on translation to the real query. Another way I've now worked out is using a common table expression (ie with statement) but that's not permitted in the querying tool I have, so thanks for the answer, and yes I may store the percentages statically soon.
nearly_lunchtime
No problem. Hopefully the query isn't too inefficient. :)
Welbog
I'll blame the query plan(ner)!
nearly_lunchtime
I don't understand your use of AVG - there's nothing even to average in this case, is there?
nearly_lunchtime
Well, what you're calculating is a weighted average. It's not a regular average, so you can't use the regular average syntax, but it is similar. It just seems like there ought to be a facility in SQL to do this kind of thing without querying the table twice. Nevermind me, I'm just rambling.
Welbog
Yeah it does seem a common enough thing to do. Oh well. Appreciate the effort.
nearly_lunchtime