views:

56

answers:

3

I'm trying to calculate the coefficient of determination (R^2) in Python, but I'm getting a negative value in certain cases. Is this a sign that there's an error in my calculation? I thought R^2 should be bounded between 0 and 1.

Here's my Python code for doing the calculation, adapted straight from the WP article:

>>> yi_list = [1, 1, 63, 63, 5, 5, 124, 124]
>>> fi_list = [1.7438055421354988, 2.3153069186947639, 1002.7093097555808, 63.097699219524706, 6.2635465467410842, 7.2275532522971364, 17.55393551900103, 40.8570]
>>> y_mean = sum(yi_list)/float(len(yi_list))
>>> ss_tot = sum((yi-y_mean)**2 for yi in yi_list)
>>> ss_err = sum((yi-fi)**2 for yi,fi in zip(yi_list,fi_list))
>>> r2 = 1 - (ss_err/ss_tot)
>>> r2
-43.802085810924964
+1  A: 

Looking at the article, I think this is expected behaviour given the input data. In the introduction it says:

Important cases where the computational definition of R2 can yield negative values, depending on the definition used, arise where the predictions which are being compared to the corresponding outcome have not derived from a model-fitting procedure using those data.

I can't see anything in the formulae that would mean it would always be in the range 0-1.

neil
+2  A: 

Your implementation of the calculation as shown in the Wikipedia article looks OK to me.

According to the Wikipedia article:

Values of R2 outside the range 0 to 1 can occur where it is used to measure the agreement between observed and modelled values and where the "modelled" values are not obtained by linear regression and depending on which formulation of R2 is used.

Looking at your data, the expected-modelled pair of 63 and 1002.7093097555808 are probably the main source of the large variance.

Dave Webb
Right, I just noticed that caveat. My data was generated with a polynomial expression, so I guess it makes sense.
Chris S
+1  A: 

No, no error in the formulat. Your value are not correlated whatsoever (look at y3 and f3 : 63 and 1002).

Just to show you that R2 is not bound to 0,1 imagine one of the f is near infinite . Serr will be near infinite too, so R2 near -infinite.

Are you not getting confused between X and Y value ?

(sorry for the "near infinite" bit, but I don't know how to say it better in english)

mb14