ansaurus

Question

How can I compute the probability at a point given a normal distribution in Perl?

Answer 1

+4 A:

dnorm(0, mean=4, sd=10) does not give you thr probability of such a point occurring. To quote Wikipedia on probability desnity function

In probability theory, a probability density function (pdf)—often referred to as a probability distribution function[1]—or density, of a random variable is a function that describes the density of probability at each point in the sample space. The probability of a random variable falling within a given set is given by the integral of its density over the set.

and the probability you mention is

R> pnorm(0, 4, 10)
[1] 0.3446

or a 34.46% chance of getting a value equal to or smaller than 0 from a N(4, 10) distribution.

As for your Perl question: If you know how to do it in R, but need it from Perl, maybe you need to write a Perl extension based on R's libRmath (provided in Debian by the package r-mathlib) to get those functions to Perl? This does not require the R interpreter.

Otherwise, you could try the GNU GSL or the Cephes libraries for access to these special functions.

Dirk Eddelbuettel 2009-09-04 03:11:53

There's already a module on CPAN that can use R. It's a mess, but I could get it to work in the past: http://search.cpan.org/~gmpassos/Statistics-R-0.02/

tsee 2009-09-04 07:30:31

the distribution function (like pnorm) in Statistics::Distributions is uprob. 1-uprob((0-4)/10) should give you ~ 0.34 (I don't have it installed to confirm this.) I doesn't have the density function, though.

Eduardo Leoni 2009-09-04 13:46:36

Answer 2

+2 A:

Why not something along these lines (I am writing in R, but it could be done in perl with Statistics::Distribution):

dn <- function(x=0 # value
               ,mean=0 # mean 
               ,sd=1 # sd
               ,sc=10000 ## scale the precision
               ) {
  res <- (pnorm(x+1/sc, mean=mean, sd=sd)-pnorm(x, mean=mean, sd=sd))*sc
  res
}
> dn(0,4,10,10000)
0.03682709
> dn(2.02,2,.24)
1.656498

[edit:1] I should mention that this approximation can get pretty horrible at the far tails. it might or might not matter depending on your application.

[edit:2] @foolishbrat Turned the code into a function. The result should always be positive. Perhaps you are forgetting that in the perl module you mention the function returns the upper probability 1-F, and R returns F?

[edit: 3] fixed a copy and paste error.

Eduardo Leoni 2009-09-04 03:19:36

@EL: Thanks. How would you adjust your approach, when the final result is negative. For example, x=2.02, mean=2, sd=0.24. Your approach would give -2.880624e-05.

neversaint 2009-09-04 09:39:35

@EL: In your last example. My machine gives different result: dn(2.02,2,.24); [1] 1.656469. I am using R version 2.9.2.

neversaint 2009-09-07 02:10:32

@foolishbrat: it was my mistake. ~1.65 is correct. (and agrees with the answer from dnorm.) sorry for the confusion.

Eduardo Leoni 2009-09-07 03:32:02

@EL: What do people do usually when dnorm is greater than 1 like the example?

neversaint 2009-09-07 06:36:02

@foolishbrat: Again, I think you are confusing probability (which is bounded between 0 and 1) with probability density (which is not). Like others pointed out, you probably want the cumulative distribution function; but there is no way for us to know it since you haven't told us what you want to do. You should also consult a intro to statistics book.

Eduardo Leoni 2009-09-07 15:59:52

Answer 3

A:

Here's how you can do the same thing you're doing with R in Perl using the Math::SymbolicX::Statistics::Distributions module from CPAN:

use strict; use warnings;

use Math::SymbolicX::Statistics::Distributions qw/normal_distribution/;

my $norm = normal_distribution(qw/mean sd/);
print $norm->value(mean => 4, sd => 10, x => 0), "\n";

# curry it with the parameter values
$norm->implement(mean => 4, sd => 10);
print $norm->value(x => 0),"\n"; # prints the same as above

The normal_distribution() function from that module is a generator for functions. $norm will be a Math::Symbolic (::Operator) object that you can modify. For example with implement, which, in the above example, replaces the two parameter variables with constants.

Note, however as Dirk pointed out, that you probably want the cumulative function of the normal distribution. Or more generally the integral in a certain range.

Unfortunately, Math::Symbolic can't do integration symbolically. Therefore, you'd have to resort to numerical integration with the likes of Math::Integral::Romberg. (Alternatively, search CPAN for an implementation of the error function.) This may be slow, but it's still easy to do. Add this to the above snippet:

use Math::Integral::Romberg 'integral';

my ($int_sub) = $norm->to_sub(); # compile to a faster Perl sub
print $int_sub->(0),"\n";  # same number as above

print "p=" . integral($int_sub, -100., 0) . "\n";
# -100 is an arbitrary, small number

This should give you the ~0.344578258389676 from Dirk's answer.

tsee 2009-09-04 08:04:26

Answer 4

+1 A:

As others have pointed out, you probably want the cumulative distribution function. This can be obtained via the error function (shifted by the mean and scaled by the standard deviation of your normal distribution), which exists in the standard math library and is made accessible in Perl by Math::Libm.

Jouni K. Seppänen 2009-09-05 11:13:00

Answer 5

+2 A:

If you really want the density function, why not use it directly.

1/sqrt(2*$pi*$sd**2)exp(-($x-$mean)*2/(2*$sd**2))

With $pi = 3.141593; $x = 2.02; $mean = 2; $sd = .24;

it gives 1.65649768474891 about the same as dnorm in R.

Eonwe 2010-10-24 20:04:43

ansaurus

tags:

views:

answers:

How can I compute the probability at a point given a normal distribution in Perl?

related questions