ansaurus

Question

How to partition a set of values (vector) in R

Answer 1

+7 A:

Instead of sampling the values, you could sample their positions.

positions <- sample(length(mydata), size=400)  # ucfagls' suggestion
firstset <- mydata[positions]
secondset <- mydata[-positions]

EDIT: ucfagls' suggestion will be more efficient (especially for larger vectors), since it avoids allocating a vector of positions in R.

Joshua Ulrich 2010-10-12 03:07:01

Very cool idea. Thanks!

Daniel Standage 2010-10-12 03:09:16

The first line can be simplified to `positions <- sample(length(mydata), size=400)` so you don't need to generate the vector from which to sample. The first argument is allowed to be a positive integer. Or even to `positions <- sample(mydata, size=400)`.

Gavin Simpson 2010-10-12 06:38:58

Surely positions <- sample(mydata, size=400) will return actual values from mydata and not positions? You'll not be able to get the other 600. You got it right first time!

Spacedman 2010-10-12 06:55:00

Answer 2

+4 A:

If mydata is truly a vector, one option would be:

split(mydata, sample(c(rep("group1", 600), rep("group2", 400))))

Greg 2010-10-12 03:07:18

I did not know the first argument of 'sample' could be a vector. Thanks!

Daniel Standage 2010-10-12 03:14:06

Additionally, this will store both subsets of the original data in one object (list), keeping the global workspace from getting cluttered.

Greg 2010-10-12 18:04:55

Answer 3

+3 A:

Just randomize mydata and take the first 400 and then last 600.

mydata <- sample(mydata)
firstset <- mydata[1:400]
secondset <- mydata[401:1000]

John 2010-10-12 03:58:20

ansaurus

tags:

views:

answers:

How to partition a set of values (vector) in R

related questions