views:

194

answers:

2

I've been doing some research on statistical significance, and I've learned a lot but seem to have hit a wall when it comes to calculating P values.

I feel like I'm about 95% of the way there; it's just that everything I read on calculating P values references a table rather than offering a programmatic solution.

It seems that Excel's TDIST function does what I want (I already have the T statistic, which I can pass to TDIST along with N - 2 as degrees of freedom where N is my sample size); but I am unclear on how this function works. Mathematically speaking, I believing it is finding the area under a normal distribution curve beyond the specified value; but what might the code look like?

Any clear, readable implementation of this kind of function would be fine: C, Java, Python, pseudocode, whatever.

+1  A: 

See this Wikipedia on student-t article for the formula of the cumulative distribution function.

A code example is e.g. in the file src/nmath/pt.c in the nmath library for R.

Dirk Eddelbuettel
+2  A: 

If you can use GNU code, I suggest using the GNU Scientific Library (GSL).

TDIST(x,df,tc) in Excel translates to

return gsl_cdf_tdist_Q(x, df) * tc;

See http://www.gnu.org/software/gsl/manual/html_node/The-t_002ddistribution.html for the manual.

See http://gsl.sourcearchive.com/documentation/1.9/cdf_2tdist_8c-source.html for implementation (it uses series expansion).

KennyTM