views:

701

answers:

4

I am looking for a way to derive a weighted average from two rows of data with the same number of columns, where the average is as follows (borrowing Excel notation):

(A1*B1)+(A2*B2)+...+(An*Bn)/SUM(A1:An)

The first part reflects the same functionality as Excel's SUMPRODUCT() function.

My catch is that I need to dynamically specify which row gets averaged with weights, and which row the weights come from, and a date range.

EDIT: This is easier than I thought, because Excel was making me think I required some kind of pivot. My solution so far is thus:

select sum(baseSeries.Actual * weightSeries.Actual) / sum(weightSeries.Actual)
from (
 select RecordDate , Actual 
 from CalcProductionRecords 
 where KPI = 'Weighty'
) baseSeries inner join (  
 select RecordDate , Actual 
 from CalcProductionRecords 
 where KPI = 'Tons Milled' 
) weightSeries on baseSeries.RecordDate = weightSeries.RecordDate
+4  A: 
SELECT  SUM(A * B) / SUM(A)
FROM    mytable
Quassnoi
You are assuming the values come from two different columns. They actually come from the same column in different sets of records.
ProfK
Could you please post some sample data then?
Quassnoi
A: 

Sounds do-able, we just need an example of what your raw data table(s) look like that contain the numbers and values to use as the weight.

Ron

Ron Savage
A: 

Quassnoi's answer shows how to do the SumProduct, and using a WHERE clause would allow you to restrict by a Date field...

SELECT
   SUM([tbl].data * [tbl].weight) / SUM([tbl].weight)
FROM
   [tbl]
WHERE
   [tbl].date >= '2009 Jan 01'
   AND [tbl].date < '2010 Jan 01'

The more complex part is where you want to "dynamically specify" the what field is [data] and what field is [weight]. The short answer is that realistically you'd have to make use of Dynamic SQL. Something along the lines of:
- Create a string template
- Replace all instances of [tbl].data with the appropriate data field
- Replace all instances of [tbl].weight with the appropriate weight field
- Execute the string

Dynamic SQL, however, carries it's own overhead. Is the queries are relatively infrequent , or the execution time of the query itself is relatively long, this may not matter. If they are common and short, however, you may notice that using dynamic sql introduces a noticable overhead. (Not to mention being careful of SQL injection attacks, etc.)

EDIT:

In your lastest example you highlight three fields:

  • RecordDate
  • KPI
  • Actual

When the [KPI] is "Weight Y", then [Actual] the Weighting Factor to use.
When the [KPI] is "Tons Milled", then [Actual] is the Data you want to aggregate.


Some questions I have are:

  • Are there any other fields?
  • Is there only ever ONE actual per date per KPI?

The reason I ask being that you want to ensure the JOIN you do is only ever 1:1. (You don't want 5 Actuals joining with 5 Weights, giving 25 resultsing records)

Regardless, a slight simplification of your query is certainly possible...

SELECT
   SUM([baseSeries].Actual * [weightSeries].Actual) / SUM([weightSeries].Actual)
FROM
   CalcProductionRecords AS [baseSeries]
INNER JOIN
   CalcProductionRecords AS [weightSeries]
      ON [weightSeries].RecordDate = [baseSeries].RecordDate
--    AND [weightSeries].someOtherID = [baseSeries].someOtherID
WHERE
   [baseSeries].KPI = 'Tons Milled'
   AND [weightSeries].KPI = 'Weighty'

The commented out line only needed if you need additional predicates to ensure a 1:1 relationship between your data and the weights.


If you can't guarnatee just One value per date, and don't have any other fields to join on, you can modify your sub_query based version slightly...

SELECT
   SUM([baseSeries].Actual * [weightSeries].Actual) / SUM([weightSeries].Actual)
FROM
(
    SELECT
        RecordDate,
        SUM(Actual)
    FROM
        CalcProductionRecords
    WHERE
        KPI = 'Tons Milled'
    GROUP BY
        RecordDate
)
   AS [baseSeries]
INNER JOIN
(
    SELECT
        RecordDate,
        AVG(Actual)
    FROM
        CalcProductionRecords
    WHERE
        KPI = 'Weighty'
    GROUP BY
        RecordDate
)
   AS [weightSeries]
      ON [weightSeries].RecordDate = [baseSeries].RecordDate

This assumes the AVG of the weight is valid if there are multiple weights for the same day.

Dems
@Dems, seems I was seeing things as too complicated, because the dynamically supplied values are field values, not names, as I have amended above.
ProfK
+1  A: 

If I have understand the problem then try this

SET DATEFORMAT dmy
    declare @tbl table(A int, B int,recorddate datetime,KPI varchar(50))
    insert into @tbl 
        select 1,10 ,'21/01/2009', 'Weighty'union all 
        select 2,20,'10/01/2009', 'Tons Milled' union all
        select 3,30 ,'03/02/2009', 'xyz'union all 
        select 4,40 ,'10/01/2009', 'Weighty'union all
        select 5,50 ,'05/01/2009', 'Tons Milled'union all 
        select 6,60,'04/01/2009', 'abc' union all
        select 7,70 ,'05/01/2009', 'Weighty'union all 
        select 8,80,'09/01/2009', 'xyz' union all
        select 9,90 ,'05/01/2009', 'kws' union all 
        select 10,100,'05/01/2009', 'Tons Milled'

    select SUM(t1.A*t2.A)/SUM(t2.A)Result  from  
             (select RecordDate,A,B,KPI from @tbl)t1 
        inner join(select RecordDate,A,B,KPI from @tbl t)t2
        on t1.RecordDate = t2.RecordDate
        and t1.KPI = t2.KPI
priyanka.sarkar