ansaurus

Question

Answer 1

A:

The key is that you can subset a data frame using vector indices. A quick example to get you thinking about what exactly you want to learn how to do:

> A <- data.frame(matrix(1:16, nrow=4, ncol=4))
> A
  X1 X2 X3 X4
1  1  5  9 13
2  2  6 10 14
3  3  7 11 15
4  4  8 12 16
> left.half <- c(1, 2)
> right.half <- c(3, 4)
> A.lh <- A[ , left.half]
> A.rh <- A[ , right.half]
> A.lh
  X1 X2
1  1  5
2  2  6
3  3  7
4  4  8
> A.rh
  X3 X4
1  9 13
2 10 14
3 11 15
4 12 16

A quick search at rseek.org found this, which is a great tutorial:

http://www.r-bloggers.com/select-operations-on-r-data-frames/

Also, Grant Farnsworth has a great guide to econmetrics in R that I find helpful. You can also find that with rseek.org's search engine.

richardh 2010-07-21 18:21:13

Answer 2

+2 A:

If you want to split a dataframe according to values of some variable, I'd suggest using daply() from the plyr package.

library(plyr)
x <- daply(df, .(splitting_variable), function(x)return(x))

Now, x is an array of dataframes. To access one of the dataframes, you can index it with the name of the level of the splitting variable.

x$Level1
#or
x[["Level1"]]

I'd be sure that there aren't other more clever ways to deal with your data before splitting it up into many dataframes though.

JoFrhwld 2010-07-21 18:28:11

please state upfront the package from which a non-base function is from - presumably you mean daply from package plyr?

mdsumner 2010-07-21 20:12:05

I loaded plyr in my code snippet, so I thought it was clear, but I'll edit the answer prose for clarity.

JoFrhwld 2010-07-21 20:18:52

Don't you mean `dlply`?

hadley 2010-07-21 20:33:46

I suggested `dlply` first, but it didn't automatically name the entries by the grouping variable. I don't know what I did first, but aparently `daply` doesn't work unless a function is specified. I edited the answer to work.

JoFrhwld 2010-07-21 21:03:26

Answer 3

+1 A:

subset() is also useful

subset(DATAFRAME, COLUMNNAME == "")

For a survey package, maybe the "survey" package is pertinent?

http://faculty.washington.edu/tlumley/survey/

apeescape 2010-07-21 18:37:32

Answer 4

+3 A:

You may also want to cut the data frame into an arbitrary number of smaller dataframes. Here, we cut into two dataframes.

x = data.frame(num = 1:26, let = letters, LET = LETTERS)
set.seed(10)
split(x, sample(rep(1:2, 13)))

gives

$`1`
   num let LET
3    3   c   C
6    6   f   F
10  10   j   J
12  12   l   L
14  14   n   N
15  15   o   O
17  17   q   Q
18  18   r   R
20  20   t   T
21  21   u   U
22  22   v   V
23  23   w   W
26  26   z   Z

$`2`
   num let LET
1    1   a   A
2    2   b   B
4    4   d   D
5    5   e   E
7    7   g   G
8    8   h   H
9    9   i   I
11  11   k   K
13  13   m   M
16  16   p   P
19  19   s   S
24  24   x   X
25  25   y   Y

Greg 2010-07-21 18:47:24

Greg,Your solution works!thanks.

Leo5188 2010-07-21 19:11:27

No problem. I'm glad it did.

Greg 2010-07-21 19:47:25

Answer 5

A:

The answer you want depends very much on how and why you want to break up the data frame.

For example, if you want to leave out some variables, you can create new data frames from specific columns of the database. The subscripts in brackets after the data frame refer to row and column numbers. Check out Spoetry for a complete description.

newdf <- mydf[,1:3]

Or, you can choose specific rows.

newdf <- mydf[1:3,]

And these subscripts can also be logical tests, such as choosing rows that contain a particular value, or factors with a desired value.

What do you want to do with the chunks left over? Do you need to perform the same operation on each chunk of the database? Then you'll want to ensure that the subsets of the data frame end up in a convenient object, such as a list, that will help you perform the same command on each chunk of the data frame.

Ben M 2010-07-22 18:21:30

Answer 6

A:

I just posted a kind of a RFC that might help you: http://stackoverflow.com/questions/3318333/split-a-vector-into-chunks-in-r

x = data.frame(num = 1:26, let = letters, LET = LETTERS)
## number of chunks
n <- 2
dfchunk <- split(x, factor(sort(rank(row.names(x))%%n)))
dfchunk
$`0`
   num let LET
1    1   a   A
2    2   b   B
3    3   c   C
4    4   d   D
5    5   e   E
6    6   f   F
7    7   g   G
8    8   h   H
9    9   i   I
10  10   j   J
11  11   k   K
12  12   l   L
13  13   m   M

$`1`
   num let LET
14  14   n   N
15  15   o   O
16  16   p   P
17  17   q   Q
18  18   r   R
19  19   s   S
20  20   t   T
21  21   u   U
22  22   v   V
23  23   w   W
24  24   x   X
25  25   y   Y
26  26   z   Z

Cheers, Sebastian

Sebastian 2010-07-23 13:09:18

ansaurus

tags:

views:

answers:

R language: how to split a data frame

related questions