tags:

views:

101

answers:

5

I would like to generate a random sequence composed of 3000 points, which follows the normal distribution. The mean is c and the standard deviation is d. But I would like these 3000 points lies in the range of [a,b].

Can you tell me how to do it in R?

If I would like to plot this sequence, if Y-axis uses the generated 3000 points, then how should I generate the points corresponding to X-axis.

+2  A: 

Generating a random sequence of numbers from any probability distribution is very easy in R. To do this for the normal distribution specifically

c = 1
d = 2
x <- rnorm(3000, c, d)

Clipping the values in x so that they're only within a given range is kind of a strange thing to want to do with a sample from the normal distribution. Maybe what you really want to do is sample a uniform distribution.

a = 0
b = 3
x2 <- runif(3000, a, b)

As for how the plot the distribution, I'm not sure I follow your question. You can plot a density estimate for the sample with this code

plot(density(x))

But, if you want to plot this data as a scatter plot of some sort, you actually need to generate a second sample of numbers.

JoFrhwld
A: 
a = -2; b = 3
plot(dnorm, xlim = c(a, b))
John
This does not do what the OP asked for. He wants 3000 points following the normal distribution, so `rnorm` is the way to go
nico
it's a poorly formed questions... I took a guess. If it was just really plotting the random values on the y-axis, as part of it suggests, what the heck is the x-axis limit request? It makes no sense. My guess is that the random values are requested to generate a density function as did @joFrhwld and @Joris Mays. This is the right way to generate a density function.
John
A: 

If I would like to plot this sequence, if Y-axis uses the generated 3000 points, then how should I generate the points corresponding to X-axis.

If you just generate your points, like JoFrhwld said with

y <- rnorm(3000, 1, 2)

Then

plot(y)

Will automatically plot them using the array indices as x axis

nico
A: 

First, you cannot choose C and D freely if you want those points to be between A and B, and normally distributed. So I assume you fix A and B, and let C and D be dependent on those. If not, then you should rethink your question, as any translation will change the mean, and any rescaling the standard deviation.

A <- 2
B <- 10

y <- rnorm(3000)

y <- y+(A-min(y))
y <- y*(B/max(y))

# plot options
hist(y) # gives the histogram
plot(density(y)) # gives the density curve, but this one is the experimental one

To get your mean and sd, just use the respective functions.

Joris Meys
+1  A: 

You can do this using standard R functions like this:

c <- 1
d <- 2

a <- -2
b <- 3.5

ll <- pnorm(a, c, d)
ul <- pnorm(b, c, d)

x <- qnorm( runif(3000, ll, ul), c, d )
hist(x)
range(x)
mean(x)
sd(x)
plot(x, type='l')

The pnorm function is used to find the limits to use for the uniform distriution, data is then generated from a uniform and then transformed back to the normal.

This is even simpler using the distr package:

library(distr)

N <- Norm(c,d)
N2 <- Truncate(N, lower=a, upper=b)

plot(N2)
x <- r(N2)(3000)
hist(x)
range(x)
mean(x)
sd(x)
plot(x, type='l')

Note that in both cases the mean is not c and the sd is not d. If you want the mean and sd of the resulting truncated data to be c and d, then you need the parent distribution (before truncating) to have different values (higher sd, mean depends on the truncating values), finding those values would be a good homework problem for a math/stat theory course. If that is what you really need then add a comment or edit the question to say so specifically.

If you want to generate the data from the untruncated normal, but only plot the data within the range [a,b] then just use the ylim argument to plot:

plot( rnorm(3000, c, d), ylim=c(a,b) )
Greg Snow