tags:

views:

70

answers:

3

Hello,

I am currently using apTreeshape to simulate phylogenetic trees using the "Yule-Hardy" Method. What I want to do is randomly generate between 20 and 25 different numbers for three different groupings (small, medium and large trees) and then generate about 40 trees for every random number chosen from within the grouping.

I know how I would do this in Python of Matlab, but in R things seem to behave a bit differently.

My thought was that if I were to generate a vector full of random numbers (one for each size grouping) and then use that to generate a vector which would basically contain all of the repeated values of each random number.

Here is what I have:

sm_leaves<-c(sample(3:50,25,replace=F));
s_leafy<-numeric();

for (i in 1:length(sm_leaves)) { 
    for (j in 1:10) {
        s_leafy[j+i-1]=sm_leaves[i];
    }
}

This is giving me output like:

> s_leafy
[1]  5 38  6 22 29 20 19 46  9 18 39 50 34 11 43  7  8 32 10 42 14 37
[23] 23 13 28 28 28 28 28 28 28 28 28 28

But What I want is something more like:

> s_leafy
[1]  5  5  5  5  5  5  5  5  5  5 38 38 38 38 38 38 38 38 38 ... 28 28 28 28 28 28 28 28 28 28

My reason for doing this is merely so that I can append this vector to a data frame along with all of the randomly generated trees - I need 2000 of them, so doing this by hand ain't quite practical.

All I have really been able to deduce from my previous attempts to solve this problem is that generally speaking while loops should be used instead of for loops, and many people have talked about using expand.grid, but I don't think that the latter is particularly useful in this case.

Thanks for reading, I hope my problem isn't too trivial (although I wouldn't be surprised if it were).

+4  A: 

Apologies if I don't quite understand the question, but what about:

sm_leaves <- sample(3:50, 25, replace=FALSE)
s_leafy <- rep(sm_leaves, each=10)
danpelota
+1  A: 

You want rep() with the each=10 option:

> set.seed(42)   
> sm_leaves <- sample(3:50,25,replace=F) 
> s_leafy <- rep(sm_leaves, each=3)        ## here rep=3 to generate small sample
> s_leafy   
 [1] 46 46 46 47 47 47 16 16 16 40 40 40 31 31 31 25 25 
[18] 25 33 33 33  8  8  8 29 29 29 30 30 30 20 20 20 42  
[35] 42 42 36 36 36 11 11 11 18 18 18 34 34 34 35 35 35 
[52]  6  6  6 17 17 17 19 19 19 28 28 28 44 44 44 41 41 
[69] 41 26 26 26  4  4  4   
>  
Dirk Eddelbuettel
+4  A: 

Using 'rep' is clearly the answer for how to do this quickly in R, but why doesn't your code work? A little investigation reveals why.

First, take out the randomness and give yourself a simple, reproducible example. Set sm_leaves to c(3,4,5) and see what happens. You get:

3 4 5 5 5 5 5 5 5 5 5 5

which still looks wrong. You expected ten 3s, ten 4s, ten 5s right? Hmmm. Add a print statement to your loop to see where the values are being stuck:

> for (i in 1:length(sm_leaves)) { 
   for (j in 1:10) {
    print(j+i-1)
    s_leafy[j+i-1]=sm_leaves[i];
   }
 }

[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
...etc....

Oops. Your vector index is wrong. j+i-1 is jumping back after every inner loop and overwriting the earlier values. You want:

s_leafy[j + (i - 1)*10]=sm_leaves[i];

So maybe this is just a simple case of you missing the *10 in the expression!

Note however that a lot of simple vector manipulation is best done using R's functions such as rep, and seq, and "[", as explained in the other answers here.

Spacedman