Consider the following code:
require(Hmisc)
num.boots <- 10
data <- rchisq(500, df = 5) #generate fake data
#create bins
binx <- cut(data, breaks = 10)
binx <- levels(binx)
binx <- sub("^.*\\,", "", binx)
binx <- as.numeric(substr(binx, 1, nchar(binx) - 1))
#pre-allocate a matrix to be filled with samples
output <- matrix(NA, nrow = num.boots, ncol = length(binx))
#do random sampling from the vector and calculate percent
# of values equal or smaller to the bin number (i)
for (i in 1:num.boots) {
walk.pair.sample <- sample(data, size = length(data), replace = TRUE)
data.cut <- cut2(x = walk.pair.sample, cuts = binx)
data.cut <- table(data.cut)/sum(table(data.cut))
output[i, ] <- data.cut
}
#do some plotting
plot(1:10, seq(0, max(output), length.out = nrow(output)), type = "n", xlab = "", ylab = "")
for (i in 1:nrow(output)) {
lines(1:10, output[i, 1:nrow(output)])
}
#mean values by columns
output.mean <- apply(output, 2, mean)
lines(output.mean, col="red", lwd = 3)
legend(x = 8, y = 0.25, legend = "mean", col = "red", lty = "solid", lwd = 3)
I was wondering if I can supply the boot:boot() function a function that has as its output a vector of length n > 1? Is it at all possible?
Here are my feeble attempts, but I must be doing something wrong.
require(boot)
bootstrapDistances <- function(data, binx) {
data.cut <- cut2(x = data, cuts = binx)
data.cut <- table(data.cut)/sum(table(data.cut))
return(data.cut)
}
> x <- boot(data = data, statistic = bootstrapDistances, R = 100)
Error in cut.default(x, k2) : 'breaks' are not unique
I don't really understand why Hmisc::cut2()
isn't working properly in the boot()
call, but works when I call it in a for()
loop (see code above). Is the logic of my bootstrapDistances()
function feasible with boot()
? Any pointers much appreciated.
.:EDIT:.
Aniko suggested I modify my function in such a way, to include an index. While reading the documentation for boot(), this wasn't clear to me how it works, which explains why the function may not be working. Here's the new function Aniko suggested:
bootstrapDistances2 <- function(data, idx, binx) {
data.cut <- cut2(x = data[idx], cuts = binx)
data.cut <- table(data.cut)/sum(table(data.cut))
return(data.cut)
}
However, I managed to produce an error and I'm still working how to remove it.
> x <- boot(data = data, statistic = bootstrapDistances2, R = 100, binx = binx)
Error in t.star[r, ] <- statistic(data, i[r, ], ...) :
number of items to replace is not a multiple of replacement length
After I restarted my R session (also tried another version, 2.10.1), it seems to be working fine.