views:

98

answers:

4

I am looking for a function that takes as input two lists, and returns the Pearson correlation, and the significance of the correlation. I am using Python.

Thank you very much.

Ariel

A: 

I don't know anything about statistics, but this looks like a page you'll like: Statistics for Python

Ned Batchelder
+6  A: 

You can have a look at scipy: http://www.scipy.org/doc/api_docs/SciPy.stats.stats.html

from pydoc import help
from scipy.stats.stats import pearsonr
help(pearsonr)

>>>
Help on function pearsonr in module scipy.stats.stats:

pearsonr(x, y)
 Calculates a Pearson correlation coefficient and the p-value for testing
 non-correlation.

 The Pearson correlation coefficient measures the linear relationship
 between two datasets. Strictly speaking, Pearson's correlation requires
 that each dataset be normally distributed. Like other correlation
 coefficients, this one varies between -1 and +1 with 0 implying no
 correlation. Correlations of -1 or +1 imply an exact linear
 relationship. Positive correlations imply that as x increases, so does
 y. Negative correlations imply that as x increases, y decreases.

 The p-value roughly indicates the probability of an uncorrelated system
 producing datasets that have a Pearson correlation at least as extreme
 as the one computed from these datasets. The p-values are not entirely
 reliable but are probably reasonable for datasets larger than 500 or so.

 Parameters
 ----------
 x : 1D array
 y : 1D array the same length as x

 Returns
 -------
 (Pearson's correlation coefficient,
  2-tailed p-value)

 References
 ----------
 http://www.statsoft.com/textbook/glosp.html#Pearson%20Correlation
Sacha
A: 

I'd recommend SciPy as mentioned in the other answers. But if you want stand-alone code, see How to compute correlation accurately.

John D. Cook
A: 

Just for completeness, you can call R's statistical functions from Python using the rpy Python package. Probably overkill if all you want is the Pearson stat, but if you then want to go on and do lots of stats things that you can't find in the Python packages in other answers here, rpy might be the way to go.

www.r-project.org

rpy.sourceforge.net

Spacedman