I am looking for a function that takes as input two lists, and returns the Pearson correlation, and the significance of the correlation. I am using Python.
Thank you very much.
Ariel
I am looking for a function that takes as input two lists, and returns the Pearson correlation, and the significance of the correlation. I am using Python.
Thank you very much.
Ariel
I don't know anything about statistics, but this looks like a page you'll like: Statistics for Python
You can have a look at scipy: http://www.scipy.org/doc/api_docs/SciPy.stats.stats.html
from pydoc import help
from scipy.stats.stats import pearsonr
help(pearsonr)
>>>
Help on function pearsonr in module scipy.stats.stats:
pearsonr(x, y)
Calculates a Pearson correlation coefficient and the p-value for testing
non-correlation.
The Pearson correlation coefficient measures the linear relationship
between two datasets. Strictly speaking, Pearson's correlation requires
that each dataset be normally distributed. Like other correlation
coefficients, this one varies between -1 and +1 with 0 implying no
correlation. Correlations of -1 or +1 imply an exact linear
relationship. Positive correlations imply that as x increases, so does
y. Negative correlations imply that as x increases, y decreases.
The p-value roughly indicates the probability of an uncorrelated system
producing datasets that have a Pearson correlation at least as extreme
as the one computed from these datasets. The p-values are not entirely
reliable but are probably reasonable for datasets larger than 500 or so.
Parameters
----------
x : 1D array
y : 1D array the same length as x
Returns
-------
(Pearson's correlation coefficient,
2-tailed p-value)
References
----------
http://www.statsoft.com/textbook/glosp.html#Pearson%20Correlation
I'd recommend SciPy as mentioned in the other answers. But if you want stand-alone code, see How to compute correlation accurately.
Just for completeness, you can call R's statistical functions from Python using the rpy Python package. Probably overkill if all you want is the Pearson stat, but if you then want to go on and do lots of stats things that you can't find in the Python packages in other answers here, rpy might be the way to go.
www.r-project.org
rpy.sourceforge.net