views:

717

answers:

5

Is there a package in Perl that allows you to compute the height of probability distribution at each given point. For example this can be done in R this way:

> dnorm(0, mean=4,sd=10)
> 0.03682701

Namely the probability of point x=0 falls into a normal distribution, with mean=4 and sd=10, is 0.0368. I looked at Statistics::Distribution but it doesn't give that very function to do it.

+4  A: 

dnorm(0, mean=4, sd=10) does not give you thr probability of such a point occurring. To quote Wikipedia on probability desnity function

In probability theory, a probability density function (pdf)—often referred to as a probability distribution function[1]—or density, of a random variable is a function that describes the density of probability at each point in the sample space. The probability of a random variable falling within a given set is given by the integral of its density over the set.

and the probability you mention is

R> pnorm(0, 4, 10)
[1] 0.3446

or a 34.46% chance of getting a value equal to or smaller than 0 from a N(4, 10) distribution.

As for your Perl question: If you know how to do it in R, but need it from Perl, maybe you need to write a Perl extension based on R's libRmath (provided in Debian by the package r-mathlib) to get those functions to Perl? This does not require the R interpreter.

Otherwise, you could try the GNU GSL or the Cephes libraries for access to these special functions.

Dirk Eddelbuettel
There's already a module on CPAN that can use R. It's a mess, but I could get it to work in the past: http://search.cpan.org/~gmpassos/Statistics-R-0.02/
tsee
the distribution function (like pnorm) in Statistics::Distributions is uprob. 1-uprob((0-4)/10) should give you ~ 0.34 (I don't have it installed to confirm this.) I doesn't have the density function, though.
Eduardo Leoni
+2  A: 

Why not something along these lines (I am writing in R, but it could be done in perl with Statistics::Distribution):

dn <- function(x=0 # value
               ,mean=0 # mean 
               ,sd=1 # sd
               ,sc=10000 ## scale the precision
               ) {
  res <- (pnorm(x+1/sc, mean=mean, sd=sd)-pnorm(x, mean=mean, sd=sd))*sc
  res
}
> dn(0,4,10,10000)
0.03682709
> dn(2.02,2,.24)
1.656498

[edit:1] I should mention that this approximation can get pretty horrible at the far tails. it might or might not matter depending on your application.

[edit:2] @foolishbrat Turned the code into a function. The result should always be positive. Perhaps you are forgetting that in the perl module you mention the function returns the upper probability 1-F, and R returns F?

[edit: 3] fixed a copy and paste error.

Eduardo Leoni
@EL: Thanks. How would you adjust your approach, when the final result is negative. For example, x=2.02, mean=2, sd=0.24. Your approach would give -2.880624e-05.
neversaint
@EL: In your last example. My machine gives different result: dn(2.02,2,.24); [1] 1.656469. I am using R version 2.9.2.
neversaint
@foolishbrat: it was my mistake. ~1.65 is correct. (and agrees with the answer from dnorm.) sorry for the confusion.
Eduardo Leoni
@EL: What do people do usually when dnorm is greater than 1 like the example?
neversaint
@foolishbrat: Again, I think you are confusing probability (which is bounded between 0 and 1) with probability density (which is not). Like others pointed out, you probably want the cumulative distribution function; but there is no way for us to know it since you haven't told us what you want to do. You should also consult a intro to statistics book.
Eduardo Leoni
A: 

Here's how you can do the same thing you're doing with R in Perl using the Math::SymbolicX::Statistics::Distributions module from CPAN:

use strict; use warnings;

use Math::SymbolicX::Statistics::Distributions qw/normal_distribution/;

my $norm = normal_distribution(qw/mean sd/);
print $norm->value(mean => 4, sd => 10, x => 0), "\n";

# curry it with the parameter values
$norm->implement(mean => 4, sd => 10);
print $norm->value(x => 0),"\n"; # prints the same as above

The normal_distribution() function from that module is a generator for functions. $norm will be a Math::Symbolic (::Operator) object that you can modify. For example with implement, which, in the above example, replaces the two parameter variables with constants.

Note, however as Dirk pointed out, that you probably want the cumulative function of the normal distribution. Or more generally the integral in a certain range.

Unfortunately, Math::Symbolic can't do integration symbolically. Therefore, you'd have to resort to numerical integration with the likes of Math::Integral::Romberg. (Alternatively, search CPAN for an implementation of the error function.) This may be slow, but it's still easy to do. Add this to the above snippet:

use Math::Integral::Romberg 'integral';

my ($int_sub) = $norm->to_sub(); # compile to a faster Perl sub
print $int_sub->(0),"\n";  # same number as above

print "p=" . integral($int_sub, -100., 0) . "\n";
# -100 is an arbitrary, small number

This should give you the ~0.344578258389676 from Dirk's answer.

tsee
+1  A: 

As others have pointed out, you probably want the cumulative distribution function. This can be obtained via the error function (shifted by the mean and scaled by the standard deviation of your normal distribution), which exists in the standard math library and is made accessible in Perl by Math::Libm.

Jouni K. Seppänen
+2  A: 

If you really want the density function, why not use it directly.

1/sqrt(2*$pi*$sd**2)exp(-($x-$mean)*2/(2*$sd**2))

With $pi = 3.141593; $x = 2.02; $mean = 2; $sd = .24;

it gives 1.65649768474891 about the same as dnorm in R.

Eonwe