ansaurus

Question

R: Using the apply function on a data frame. Help me get my vector, Victor.

Answer 1

+3 A:

You want:

normalize <- apply(hist_data, 2, function(x) pnorm(x, mean=mean(x), sd=sd(x)))

The problem is that you're passing in the individual column into pnorm, but the entire hist_data into both the mean & the sd.

As I mentioned on twitter, I'm no stats guy so I can't answer anything about what you're actually trying to do :)

geoffjentry 2009-08-07 19:11:45

I think there is an extra comma in your example. I think there needn't be a comma after function(x). This is exactly what I wanted to do. And an example of how much more compact vector code is than looping code. Thanks so much for helping me with this. And for following #rstats in Twitter!

JD Long 2009-08-07 19:39:58

Oops, yeah. I typed that in by hand, didn't c+p it. This is my exact line: normalize <- apply(hist_data, 2, function(x) pnorm(x, mean=mean(x), sd=sd(x)))

geoffjentry 2009-08-07 19:42:20

Answer 2

A:

I'm just curious what your goal is. Using the pnorm function, you are getting which percentile of a normal distribution with the specified mean and sd your data would correspond to. For example, if your data is -2,-1,0,1,2, which has mean 0 and sd 1.58, the results of your function would be 0.10 0.26 0.50 0.74 0.90, rounded to 2 digits. This means that your data would correspond to the 10th, 26th, 50th, 74th and 90th percentiles of the normal distribution with mean 0 and sd 1.58, if the data was truly from that distribution. I'm not sure why this is useful, so I hope to be enlightened

Abhijit 2009-09-04 03:40:09

Well it has been a month since I asked the question and I don't recall _exactly_ what I was doing, but here's the general idea: I was building a monte carlo model of non-normal correlated distributions. In my real application the distributions were not normal. They were either Johnson or they were non-parametric (probably kernels) but I had a p function like pnorm or pjohnson. After taking the percentile I would then use the correlation matrix and fit a copula to the percentiles (now uniform between 0,1). I would then simulate correlated deviates. Then map those deviates back to 'real' values.

JD Long 2009-09-04 06:48:59

ansaurus

tags:

views:

answers:

R: Using the apply function on a data frame. Help me get my vector, Victor.

related questions