Are there any Linear Regression Function in SQL Server 2005/2008, similar to the the Linear Regression functions in Oracle ?
+3
A:
To the best of my knowledge, there is none. Writing one is pretty straightforward, though. The following gives you the constant alpha and slope beta for y = alpha + beta * x + epsilon:
-- test data
WITH some_table(SomeID, x, y) AS
( SELECT 1, 1.0, 2.0
UNION ALL
SELECT 1, 2.0, 2
UNION ALL
SELECT 1, 3, 4),
-- linear regression query
/*WITH*/ average_estimate AS
( SELECT SomeID
,avg(x) as xbar
,avg(y) as ybar
FROM some_table
GROUP BY SomeID
),
beta_estimate as
( SELECT pa.SomeID
,CASE COALESCE(SUM((x-xbar)*(x-xbar)),0)
WHEN 0 THEN NULL
ELSE SUM((x-xbar)*(y-ybar)) / SUM((x-xbar)*(x-xbar))
END AS Beta
FROM some_table pd
INNER JOIN average_estimate pa
ON pa.SomeID = pd.SomeID
GROUP BY pa.SomeID
)
SELECT pb.SomeID
,ybar - xbar * pb.Beta AS Alpha
,pb.Beta
FROM beta_estimate pb
INNER JOIN average_estimate pa
ON pa.SomeID = pb.SomeID
stephan
2010-03-29 09:58:00
Thanks!! had to use this to solve my problem. Problem, in a broader perspective, was to get a trend line in SSRS (2005) report. This was the only way.
rao
2010-03-29 11:38:24
@pavanrao: you are welcome. Added estimate for constant alpha to the query
stephan
2010-03-29 11:43:10
A:
This is an alternate method, including the equations behind it. This guy explains the derivation of the equations: http://knol.google.com/k/linus-nilsson/linear-regression-in-t-sql. His T-SQL suggestion uses cursors though. This site then has simpler T-SQL select statement: http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=21948. Here's a prettified version that I used:
table
-----
X (numeric)
Y (numeric)
/* y = mx + b
m = (nSxy - SxSy) / (nSxx - SxSx)
b = Ay - (Ax * m)
N.B. S = Sum, A = Avg */
@n = Count(*) FROM [dbo].[table]
Select
((@n * Sum(X*Y)) - (Sum(X) * Sum(Y)))/
((@n * Sum(X*X)) - (Sum(X) * Sum(X))) AS M,
Avg(Y) - Avg(X) *
((@n * Sum(X*Y)) - (Sum(X) * Sum(Y)))/
((@n * Sum(X*X)) - (Sum(X) * Sum(X))) AS B,
FROM [dbo].[table]
icc97
2010-07-19 16:51:44