views:

62

answers:

2

My grain sizes are defined as D=[1.19,1.00,0.84,0.71,0.59,0.50,0.42]. The problem is described below in steps.

  1. Grain sizes should follow lognormal distribution.
  2. The mean of the grain sizes is fixed as 0.84 and the standard deviation should be as low as possible but not zero.
  3. 90% of the grains (by weight %) fall in the size range of 1.19 to 0.59, and the rest 10% fall in size range of 0.50 to 0.42.
  4. Now I want to find the probabilities (weight percentage) of the grains falling in each grain size.
  5. It is allowable to split this grain size distribution into further small sizes but it must always be in the range of 1.19 and 0.42, i.e. 'D' can be continuous but 0.42 < D < 1.19.

I need it fast. I tried on my own but I am not able to get the correct result. I am getting negative probabilities (weight percentages). Thanks to anyone who helps.

I didn't incorporate the point 3 as I came to know about that condition later. Here are simple steps I tried:

%%

D=[1.19,1.00,0.84,0.71,0.59,0.50,0.42];

s=0.30; % std dev of the lognormal distribution

m=0.84; % mean of the lognormal distribution

mu=log(m^2/sqrt(s^2+m^2)); % mean of the associated normal dist.

sigma=sqrt(log((s^2/m^2)+1)); % std dev of the associated normal dist.

[r,c]=size(D);

for i=1:c

D_normal(i)=mu+(sigma.*randn(1));

w(i)=(D_normal(i)-mu)/sigma; % the probability or the wt. percentage of the grain sizes

end

grain_size=exp(D_normal);

%%

I would like to rephrase my problem in following simple way:

I want to develop a lognormal distribution with range [0.42,1.19], whose few elements are given as D=[1.19,1.00,0.84,0.71,0.59,0.50,0.42]. The mean should be 0.84 and standard deviation as small as possible. Also given is that the 90% of cdf (=90% of the grains) lies between 0.59 and 1.19.

Once I know all the elements of this lognormal distribution which incorporates the given conditions I can find its pdf, which is what I require.

A: 

It seems that you are looking to generate truncated lognormal random numbers. If my assumption is correct you can either use the rejection sampling or inverse transform sampling to generate the necessary samples. Caveat: Rejection sampling is very inefficient if your bounds are very far from the mean.

Rejection Sampling

If x ~ LogNormal(mu,sigma) I(lb < x < ub )

Then generate, x ~ LogNormal(mu,sigma) and accept the draw if lb < x < ub.

Inverse Transform Sampling

If x ~ LogNormal(mu,sigma) I(lb < x < ub ) then

CDF(x) = phi((log(x) - mu)/sigma) /( phi((log(ub) - mu)/sigma) - phi((log(lb) - mu)/sigma))

Generate, u ~ Uniform(0,1).

Set, CDF(x) = u and invert for x.

In other words,

x = exp( mu + sigma * phi_inverse( u * ( phi((log(ub) - mu)/sigma) - phi((log(lb) - mu)/sigma)) ) )

Anon
My lognormal distribution is truncated. These grain sizes given here are picked from the whole sample available. Though grain sizes follow lognormal distribution but I am not sure if these truncated values from the entire lot would follow the same distribution. I dont want my random numbers to be truncated. They should be between 0 and 1 representing my weight percentages.
Harpreet
You should consider using standard terminology to avoid confusion. When you say weight percentages do you mean the probability that a grain's size falls between two values or do you mean the pdf associated with a particular grain size? You say that your distribution is truncated but then the last but one line you say you do not want your random numbers to be truncated. Those are contradictory statements.
Anon
I mean pdf associated with a particular grain size. I want all my random numbers (pdf) to be generated considering this given distribution as complete. Though more values WITHIN the distribution (not outside the given range) could be added, however I don't know how to do that.
Harpreet
In that case, you should do what Jonas suggested. If you want your pdf to be that of a truncated lognormal then compute the pdf as suggested by Jonas but then divide by the value by ( phi((log(ub) - mu)/sigma) - phi((log(lb) - mu)/sigma))
Anon
I'm sorry but I don't know what are 'phi', 'ub' and 'lb'. Can you tell what are they?
Harpreet
phi is normalcdf with mean 0 and std dev 1. lb and ub are the lower and upper bounds for your random variable.
Anon
+1  A: 

If you have the statistics toolbox and you want to draw random values from the lognormal distribution, you can simply call LOGNRND. If you want to know the density of the lognormal distribution with a given mean and sigma at a specific value, you use LOGNPDF.

Since you're calculating weights, you may be looking for the density. These would be, in your example:

weights = lognpdf([1.19,1.00,0.84,0.71,0.59,0.50,0.42],0.84,0.3)

weights =
     0.095039     0.026385     0.005212   0.00079218   6.9197e-05   5.6697e-06   2.9244e-07

EDIT

If you want to know what percentage of grains falls into the range of 0.59 to 1.19, you use LOGNCDF:

100*diff(logncdf([0.59,1.19],0.84,0.3))
ans =
       1.3202

That's not a lot. If you plot the distribution, you'll notice that the lognormal distribution with your values peaks a bit above 2

x = 0:0.01:10;
figure
plot(x,lognpdf(x,0.84,0.3))
Jonas
Thank you for your response.I don't want to pick random value from the grain sizes i.e. from the lognormal distribution. I want the probability of their occurrences given their mean and standard deviation, and also given range of the distribution. Further I know that 90% of the grains (by weight) fall between 1.19 and 0.59 while the rest falls between 0.59 to 0.42 grain sizes.
Harpreet
@Harpreet: Have you looked at my edit? Have you plotted the distribution? The lognormal distribution peaks at `exp(0.84)`, not at 0.84, and thus only 1.3% of the values fall into the range where you'd expect 90%. Also, what do you mean with the probability of the occurrences? If it's the value of the probability density function, i.e. the probability of drawing a specific value from a distribution, I have calculated that for you already as `weights`.
Jonas
Jonas, I did look at all what you said. I mean pdf (when I said probability of occurrences). I am actually not able to see any sign of lognormal distribution in the given data. Its more like a zigzag noise in shape. How does peak occurs at exp(0.84)? Shouldn't it be log(0.84) instead? To avoid entangling in complexity of communication further, my question is: I want to develop a lognormal distribution with range [0.30,1.19], whose few elements are given in 'D'. The mean should be 0.84 and standard deviation as small as possible. Also given is that the 90% of cdf lies between 0.59 and 1.19.
Harpreet

related questions