views:

93

answers:

2

Consider the following code:

require(Hmisc)
num.boots <- 10
data <- rchisq(500, df = 5) #generate fake data

#create bins
binx <- cut(data, breaks = 10)
binx <- levels(binx)
binx <- sub("^.*\\,", "", binx)
binx <- as.numeric(substr(binx, 1, nchar(binx) - 1))

#pre-allocate a matrix to be filled with samples
output <- matrix(NA, nrow = num.boots, ncol = length(binx)) 

#do random sampling from the vector and calculate percent
# of values equal or smaller to the bin number (i)
for (i in 1:num.boots) {
    walk.pair.sample <- sample(data, size = length(data), replace = TRUE)
    data.cut <- cut2(x = walk.pair.sample, cuts = binx)
    data.cut <- table(data.cut)/sum(table(data.cut))
    output[i, ] <- data.cut
}

#do some plotting
plot(1:10, seq(0, max(output), length.out = nrow(output)), type = "n", xlab = "", ylab = "")

for (i in 1:nrow(output)) {
    lines(1:10, output[i, 1:nrow(output)])
}

#mean values by columns
output.mean <- apply(output, 2, mean)
lines(output.mean, col="red", lwd = 3)
legend(x = 8, y = 0.25, legend = "mean", col = "red", lty = "solid", lwd = 3)

I was wondering if I can supply the boot:boot() function a function that has as its output a vector of length n > 1? Is it at all possible?

Here are my feeble attempts, but I must be doing something wrong.

require(boot)
bootstrapDistances <- function(data, binx) {
    data.cut <- cut2(x = data, cuts = binx)
    data.cut <- table(data.cut)/sum(table(data.cut))
    return(data.cut)
}

> x <- boot(data = data, statistic = bootstrapDistances, R = 100)
Error in cut.default(x, k2) : 'breaks' are not unique

I don't really understand why Hmisc::cut2() isn't working properly in the boot() call, but works when I call it in a for() loop (see code above). Is the logic of my bootstrapDistances() function feasible with boot()? Any pointers much appreciated.

.:EDIT:.

Aniko suggested I modify my function in such a way, to include an index. While reading the documentation for boot(), this wasn't clear to me how it works, which explains why the function may not be working. Here's the new function Aniko suggested:

bootstrapDistances2 <- function(data, idx, binx) { 
  data.cut <- cut2(x = data[idx], cuts = binx) 
  data.cut <- table(data.cut)/sum(table(data.cut)) 
  return(data.cut) 
} 

However, I managed to produce an error and I'm still working how to remove it.

> x <- boot(data = data, statistic = bootstrapDistances2, R = 100, binx = binx)
Error in t.star[r, ] <- statistic(data, i[r, ], ...) : 
  number of items to replace is not a multiple of replacement length

After I restarted my R session (also tried another version, 2.10.1), it seems to be working fine.

+2  A: 

From the help-file for the boot function:

In all other cases statistic must take at least two arguments. The first argument passed will always be the original data. The second will be a vector of indices, frequencies or weights which define the bootstrap sample.

So you need to add a second parameter to your bootstrapDistances function that will tell it which elements of the data are selected:

bootstrapDistances2 <- function(data, idx, binx) { 
  data.cut <- cut2(x = data[idx], cuts = binx) 
  data.cut <- table(data.cut)/sum(table(data.cut)) 
  return(data.cut) 
} 

And the results:

x <- boot(data = data, statistic = bootstrapDistances2, R = 100, binx=binx)
x

ORDINARY NONPARAMETRIC BOOTSTRAP


Call:
boot(data = data, statistic = bootstrapDistances2, R = 100, binx = binx)


Bootstrap Statistics :
     original   bias    std. error
t1*     0.208  0.00134 0.017342783
t2*     0.322  0.00062 0.021700803
t3*     0.190 -0.00034 0.018873433
t4*     0.136 -0.00116 0.016206197
t5*     0.078 -0.00120 0.011413265
t6*     0.036  0.00070 0.008510837
t7*     0.016  0.00074 0.005816417
t8*     0.006  0.00024 0.003654581
t9*     0.000  0.00000 0.000000000
t10*    0.008 -0.00094 0.003368961
Aniko
Kudos for trying, but I get this error: x <- boot(data = data, statistic = bootstrapDistances2, R = 100, binx = binx)Error in t.star[r, ] <- statistic(data, i[r, ], ...) : number of items to replace is not a multiple of replacement length
Roman Luštrik
After restaring my R session, things worked fine. Ugh? Thank you for your cooperation.
Roman Luštrik
A: 

Good answer, Aniko.

Also, the help page for "boot" states that the bootstrap statistic function may return a vector, not merely a scalar.

pteetor
Now I see it - it says "statistic(s)"!
Roman Luštrik