tags:

views:

113

answers:

3

I often want to do essentially the following:

mat <- matrix(0,nrow=10,ncol=1)
lapply(1:10, function(i) { mat[i,] <- rnorm(1,mean=i)})

But, I would expect that mat would have 10 random numbers in it, but rather it has 0. (I am not worried about the rnorm part. Clearly there is a right way to do that. I am worry about affecting mat from within an anonymous function of lapply) Can I not affect matrix mat from inside lapply? Why not? Is there a scoping rule of R that is blocking this?

A: 

Instead of actually altering mat, lapply just returns the altered version of mat (as a list). You just need to assign it to mat and turn it back into a matrix using as.matrix().

Fojtasek
+1  A: 

One of the main advantages of higher-order functions like lapply() or sapply() is that you don't have to initialize your "container" (matrix in this case).

As Fojtasek suggests:

as.matrix(lapply(1:10,function(i) rnorm(1,mean=i)))

Alternatively:

do.call(rbind,lapply(1:10,function(i) rnorm(1,mean=i)))

Or, simply as a numeric vector:

sapply(1:10,function(i) rnorm(1,mean=i))

If you really want to modify a variable above of the scope of your anonymous function (random number generator in this instance), use <<-

> mat <- matrix(0,nrow=10,ncol=1)
> invisible(lapply(1:10, function(i) { mat[i,] <<- rnorm(1,mean=i)}))
> mat
           [,1]
 [1,] 1.6780866
 [2,] 0.8591515
 [3,] 2.2693493
 [4,] 2.6093988
 [5,] 6.6216346
 [6,] 5.3469690
 [7,] 7.3558518
 [8,] 8.3354715
 [9,] 9.5993111
[10,] 7.7545249

See this post about <<-. But in this particular example, a for-loop would just make more sense:

mat <- matrix(0,nrow=10,ncol=1)
for( i in 1:10 ) mat[i,] <- rnorm(1,mean=i)

with the minor cost of creating a indexing variable, i, in the global workspace.

Stephen
+7  A: 

I discussed this issue in this related question: "Is R’s apply family more than syntactic sugar". You will notice that if you look at the function signature for for and apply, they have one critical difference: a for loop evaluates an expression, while an apply loop evaluates a function.

If you want to alter things outside the scope of an apply function, then you need to use <<- or assign. Or more to the point, use something like a for loop instead. But you really need to be careful when working with things outside of a function because it can result in unexpected behavior.

In my opinion, one of the primary reasons to use an apply function is explicitly because it doesn't alter things outside of it. This is a core concept in functional programming, wherein functions avoid having side effects. This is also a reason why the apply family of functions can be used in parallel processing (and similar functions exist in the various parallel packages such as snow).

Lastly, the right way to run your code example is to also pass in the parameters to your function like so, and assigning back the output:

mat <- matrix(0,nrow=10,ncol=1)
mat <- matrix(lapply(1:10, function(i, mat) { mat[i,] <- rnorm(1,mean=i)}, mat=mat))

It is always best to be explicit about a parameter when possible (hence the mat=mat) rather than inferring it.

Shane
Ah, thank you for the clarification. I really appreciate the help.
stevejb
No problem. Please accept the answer when you have a chance, so that others can know that it answered your question.
Shane