views:

372

answers:

1

Most functions for generating lognormally distributed random numbers take the mean and standard deviation of the associated normal distribution as parameters.

My problem is that I only know the mean and the coefficient of variation of the lognormal distribution. It is reasonably straight forward to derive the parameters I need for the standard functions from what I have:

If mu and sigma are the mean and standard deviation of the associated normal distribution, we know that

coeffOfVar^2 = variance / mean^2
             = (exp(sigma^2) - 1) * exp(2*mu + sigma^2) / exp(mu + sigma^2/2)^2
             = exp(sigma^2) - 1

We can rearrange this to

sigma = sqrt(log(coeffOfVar^2 + 1))

We also know that

mean = exp(mu + sigma^2/2)

This rearranges to

mu = log(mean) - sigma^2/2

Here's my R implementation

rlnorm0 <- function(mean, coeffOfVar, n = 1e6)
{
   sigma <- sqrt(log(coeffOfVar^2 + 1))
   mu <- log(mean) - sigma^2 / 2
   rlnorm(n, mu, sigma)
}

It works okay for small coefficients of variation

r1 <- rlnorm0(2, 0.5)
mean(r1)                 # 2.000095
sd(r1) / mean(r1)        # 0.4998437

But not for larger values

r2 <- rlnorm0(2, 50)
mean(r2)                 # 2.048509
sd(r2) / mean(r2)        # 68.55871

To check that it wasn't an R-specific issue, I reimplemented it in MATLAB. (Uses stats toolbox.)

function y = lognrnd0(mean, coeffOfVar, sizeOut)
if nargin < 3 || isempty(sizeOut)
   sizeOut = [1e6 1];
end
sigma = sqrt(log(coeffOfVar.^2 + 1));
mu = log(mean) - sigma.^2 ./ 2;
y = lognrnd(mu, sigma, sizeOut);
end

r1 = lognrnd0(2, 0.5);
mean(r1)                  % 2.0013
std(r1) ./ mean(r1)       % 0.5008

r2 = lognrnd0(2, 50);
mean(r2)                  % 1.9611
std(r2) ./ mean(r2)       % 22.61

Same problem. The question is, why is this happening? Is it just that the standard deviation is not robust when the variation is that wide? Or have a screwed up somewhere?

+7  A: 

The results are not surprising. For distributions with large kurtosis, expected variance of the sample variance is roughly mu4/N, where mu4 is the 4th moment of the distribution. For a lognormal, mu4 exponentially depends on the parameter sigma^2, meaning that for large enough values of sigma, your sample variance will be all over the place relative to the true variance. This is precisely what you have observed. In your example, mu4/N ~ (coeffOfVar^8)/N ~ 50^8/1e6 ~ 4e7.

For derivation of the expected variance of sample var. see http://mathworld.wolfram.com/SampleVarianceDistribution.html. Below is some code to illustrate the ideas in a more exact manner. Note the large value of both the variance of the sample variance and its theoretical expected value, even for coeffOfVar = 5.

exp.var.of.samp.var <- function(n,mu2,mu4){
  (n-1)*((n-1)*mu4-(n-3)*mu2^2)/n^3
}
mu2.lnorm <- function(mu,sigma){
  (exp(sigma^2)-1)*exp(2*mu+sigma^2)
}
mu4.lnorm <- function(mu,sigma){
  mu2.lnorm(mu,sigma)^2*(exp(4*sigma^2)+2*exp(3*sigma^2)+3*exp(2*sigma^2)-3)
}
exp.var.lnorm.var <- function(n,mu,sigma){
  exp.var.of.samp.var(n,mu2.lnorm(mu,sigma),mu4.lnorm(mu,sigma))
}
exp.var.norm.var <- function(n,mu,sigma){
  exp.var.of.samp.var(n,sigma^2,3*sigma^4)
}    

coeffOfVar <- 5 
mean <- 2
sigma <- sqrt(log(coeffOfVar^2 + 1)) # gives sigma=1.805020
mu <- log(mean) - sigma^2 / 2 # mu=-0.935901
n <- 1e4
m <- 1e4
## Get variance of sample variance for lognormal distribution:
var.trial <- replicate(m,var(rlnorm(n, mu, sigma)))
cat("samp. variance (mean of",m,"trials):",mean(var.trial),"\n")
cat("theor. variance:",mu2.lnorm(mu,sigma),"\n")
cat("variance of the sample var:",var(var.trial),"\n")
cat("expected variance of the sample var:",exp.var.lnorm.var(n,mu,sigma),"\n")
> samp. variance (mean of 10000 trials): 105.7192 
> theor. variance: 100 
> variance of the sample var: 350997.7 
> expected variance of the sample var: 494053.2 
## Do this with normal distribution:
var.trial <- replicate(m,var(rnorm(n, mu, sigma)))
cat("samp. variance (mean of",m,"trials):",mean(var.trial),"\n")
cat("theor. variance:",sigma^2,"\n")
cat("variance of the sample var:",var(var.trial),"\n")
cat("expected variance of the sample var:",exp.var.norm.var(n,mu,sigma),"\n")
> samp. variance (mean of 10000 trials): 3.257944 
> theor. variance: 3.258097 
> variance of the sample var: 0.002166131 
> expected variance of the sample var: 0.002122826 
Leo Alekseyev
@Leo A: Thanks for this; I've learnt something new here. I think your lognormal numbers look odd because you've redefined `sigma` to be half what it should be (after the definitions of `n`, `m`).
Richie Cotton
Correct me if I'm wrong but I think some points need double-checking. mu4.lnorm should be exp(4*sigma^2)+2*exp(3*sigma^2)+3*exp(2*sigma^2)-6 according to http://mathworld.wolfram.com/LogNormalDistribution.html. Also sigma shouldn't be multiplied by 0.5 and should be calculated before mu. You can find the formula for exp.var.of.samp.var here (7.4 on page 241) http://www.scribd.com/doc/24280430/Statistics-and-Data-With-R
gd047
@gd047: good catch on mu4 (it was wrong -- I was mixing up variance of the normal and the lognormal -- but I don't think your version is correct either :) ). The numbers look better now.
Leo Alekseyev
@Leo: If you are talking about the -6, it's just because it uses the excess kurtosis = m4/s^4 - 3
gd047
The formula you gave in the comment was actually m4/s^4 whereas m4 is needed, I believe.
Leo Alekseyev
@Leo: Hmm, according to the book the central moment is needed and not the standardized, so you are right :)
gd047