ansaurus

Question

How to re arrange data in a dataframe using R(combine similiar repeating columns)

Answer 1

+1 A:

Try:

x1 <- seq(from=1, to=ncol(df)-1, by=6)
x2 <- seq(from=6, to=ncol(df), by=6)

dfnew <- data.frame("V1"=0,"V2"=0,"V3"=0,"V4"=0,"V5"=0,"V6"=0)

for(x in 1:(ncol(df)/6)) {
tmpdf <- df[x1[x]:x2[x]]
colnames(tmpdf) <- colnames(dfnew)
dfnew <- rbind(dfnew,tmpdf)
}

Brandon Bertelsen 2010-10-05 16:16:56

yes, this works great. Well I tried it for the entire day already. Great Help Thanks.

Sebastian 2010-10-05 16:32:04

Answer 2

A:

Here's a simple loop to do the work for you:

First, dummy data

> set.seed(123)
> DF <- data.frame(matrix(rnorm(5*6*6), ncol = 36))
> names(DF) <- paste(rep(1:6, each = 6), "V", rep(1:6, times = 6), sep = "")
> names(DF)
 [1] "1V1" "1V2" "1V3" "1V4" "1V5" "1V6" "2V1" "2V2" "2V3" "2V4" "2V5" "2V6"
[13] "3V1" "3V2" "3V3" "3V4" "3V5" "3V6" "4V1" "4V2" "4V3" "4V4" "4V5" "4V6"
[25] "5V1" "5V2" "5V3" "5V4" "5V5" "5V6" "6V1" "6V2" "6V3" "6V4" "6V5" "6V6"

Now set up the loop so that at each stage we take the i, i+6, i+(2*6), ... cols of the data frame and stack them in a vector into the new data frame DF2

> n <- 6 ## number of groups of 6
> DF2 <- data.frame(matrix(NA, ncol = 6, nrow = 6 * nrow(DF)))
> for(i in seq_len(n)) {
+     DF2[[i]] <- unlist(DF[, seq(i, n*6, by = 6)])
+ }
> names(DF2) <- paste("V", seq_len(n), sep = "")
> head(DF2)
           V1         V2         V3         V4         V5         V6
1 -0.56047565  1.7150650  1.2240818  1.7869131 -1.0678237 -1.6866933
2 -0.23017749  0.4609162  0.3598138  0.4978505 -0.2179749  0.8377870
3  1.55870831 -1.2650612  0.4007715 -1.9666172 -1.0260044  0.1533731
4  0.07050839 -0.6868529  0.1106827  0.7013559 -0.7288912 -1.1381369
5  0.12928774 -0.4456620 -0.5558411 -0.4727914 -0.6250393  1.2538149
6  0.42646422  0.6886403 -0.6947070 -1.1231086  0.2533185  1.5164706

This presumes that there are only ever 6 variables, but n controls the number of sets of 6 you have.

Gavin Simpson 2010-10-05 16:35:01

Not sure if I am right but I assume it must be ncol(DF) instead of nrow(DF) inDF2 <- data.frame(matrix(NA, ncol = 6, nrow = 6 * ncol(DF))). With the provided sample matirx it works but with my data it does not since there is also non numeric content.

Sebastian 2010-10-06 13:11:03

@SebM: No, it needs to be `6 * nrow(DF)`. In your example, if I understood it properly, then if you have 5 rows in the original structure, and `n == 6` is the number of 'sets' of data (or number of variables) then you have 6 * 5 rows in the data structure you want. I suspect it is failing because of the non-numeric content, but as you didn't mention this in your post nor give us example data it is a bit difficult second guessing your needs.

Gavin Simpson 2010-10-06 13:25:48

@SebM: If the non-numeric stuff is not needed (i.e. not part of the V1, V2 etc) then why not exclude it first? `oldDF <- DF` followed by `DF <- DF[, -cols]` where `cols` contains the indices of the non-numeric columns. Then run through the loop.

Gavin Simpson 2010-10-06 13:26:35

Yes you are right, now I got it. Actually, I have much more lines ;) but its easy to read it out of the data frame. Well I think its a good solution too. Unfortunately, I need the non numeric data for further analysis as well. @UCFAGLS Could you try to explain me how lists work? I mean obviously the dataframe is split up into several list elements DF[[i]]. Why are these elements list elements and not vector type data? it seems to me that with lists you can not work in the same way as with indexed vectors (see unlist command - how is the order of the unlisted elements then in DF?)

Sebastian 2010-10-07 08:49:22

@SebM; Do you want me to explain how my solution works or the more general query? Note that `DF[[i]]` is a vector but `DF[i]` is a list of one component, if `DF` is data frame. The order of elements in the `unlist`-ed is as if we took the selected columns in turn and concatenated them =, from first to last. Try this to see what I mean: `unlist(data.frame(matrix(1:9, ncol = 3)))`. So my solution creates the stacked cols from DF that represent a single variable, and inserts them into `DF2` as a vector (as `[[` extracts/replaces the vector in `DF2` not a list containing a single vector.

Gavin Simpson 2010-10-07 11:56:05

OK, yes Thanks. I guess I got an idea how the R use of DF[[]] and DF[] of dataframes is. That make things clearer. So the brackets [] of a data frame will always result in a list.

Sebastian 2010-10-07 15:36:43

@Sebastian; well, `DF[1]` returns a data frame with a single component (column). `class(DF[1])` shows this. As data frames are special types of list then `DF[1]` could be thought of as a list also (try `typeof(DF[1])`). This is being bit picky though. So a simple answer to your Q would be yes. For a list or a data frame, `[` will return a list or a data frame respectively.

Gavin Simpson 2010-10-08 08:30:35

Maybe I am turning in circles. But so far I thought Data Frames containing a list of variables which then allows to access different data types such nominal categorical or numeric under one name e.g. DF (what is called data frame (or data structure)). So for me a list is a subcomponent of a data frame. So I am not quite sure how a data frame can be a list. I only can imagine a list which contains several data frames or lets say one element of a list may be a data frame (or a vector or a matrix...). So in my example DF is a data frame and by[] i am accessing the variables in list form of DF.

Sebastian 2010-10-11 16:27:07

A data frame is a special type of list, one in which the elements all have the same "size" (length if vectors, row size if matrices). The individual elements of a data frame can, just like a list, be of different types. A list is a generalization of the data frame. The components can contain anything (data frame components can only contain a few types of R objects), and can be of any length. So, a data frame **is** a list but a list can **not** necessarily become a data.frame.

Gavin Simpson 2010-10-11 16:36:05

On your last point, not really. Your DF is a data frame. When you use `[` it returns a data frame. e.g. `DF[1]` returns a data frame with a single component (column), whereas `DF[[1]]` will return directly the contents of the first component (column) of DF. To be clear, assume `DF` is generated by `DF <- data.frame(A = 1:10, B = 1:10)`, `DF[1]` will return a data.frame with the data for `A` in it, whilst `DF[[1]]` will return will return a numeric vector.

Gavin Simpson 2010-10-11 16:42:28

For a list `L` defined by `L <- list(A = "foo", B = 1:5)`, `L[1]` will return a list with a single component that **contains** a character vector of length 1. `L[2]` returns a list that **contains** a numeric vector of length 5. `L[[2]]` will return a numeric vector of length 5.

Gavin Simpson 2010-10-11 16:44:32

OK. Thanks for the explanation. I got it. (finally) Thanks. Hope I can operate with it well. :) Thanks again.

Sebastian 2010-10-11 16:53:27

ansaurus

tags:

views:

answers:

How to re arrange data in a dataframe using R(combine similiar repeating columns)

related questions