tags:

views:

86

answers:

2

I have a file where a data structure containing 6 columns is stored side by side. That means I have n times 6 columns stored in a flat file.
Basically, I want to rearrange the data in a form that I only have a data.frame containing 6 columns but appending all the data from the file to the end of the first 6 columns.
Row 1V1 1V2 1V3 1V4 1V5 1V6 2V1 2V2 2V3 2V4 2V5 2V6 3V1...
1
2

The result should look like that moving data from 2V1-2V6 to the end of 1V1-1V6
Row V1 V2 V3 V4 V5 V6
1-1
1-2
2-1
2-2

I looked up some code snippets and could load the data into a data frame with all the vectors. Then I tried to create n dataframes containing always the repeating data structures. Then I tried to combine the single dataframes to a final one but it does not work.

df<-read.table("test.txt",header = FALSE, sep = ";", skip = 2)
columnmax=as.integer(ncol(df)/6)
dfnew <- vector(mode="list",length=columnmax)
for ( i in 1:columnmax) {
 start<-((i-1)*6+1)
 end<-(i*6)
 dfnew[[i]]<-df[,start:end]
}
y <- do.call(rbind, dfnew)

RESULT: Error in match.names(clabs, names(xi)) : names do not match previous names

I used the list mode because I didnt get it working to separate the dataframe otherwise. But it seems now to me that it makes the rbind to a problem because the "columnnames" are not identically. I havent not even an idea how to change the column names because its not a matrix in R termini but a list. I am sure there must be a much simpler way to do what I want but I am just beginning in R and not familiar with the many different concepts of data types.

EDIT: DATA
structure(list(V1 = NA, V2 = NA, V3 = NA, V4 = NA, V5 = NA, V6 = NA, V7 = NA, V8 = NA, V9 = NA, V10 = NA, V11 = NA, V12 = NA, V13 = structure(1L, .Label = "1,20101E+27", class = "factor"), V14 = structure(1L, .Label = "05.07.2010 14:50", class = "factor"), V15 = structure(1L, .Label = "ADMINISTRATOR", class = "factor"), V16 = 1L, V17 = NA, V18 = NA, V19 = structure(1L, .Label = "1,20101E+27", class = "factor"), V20 = structure(1L, .Label = "05.07.2010 14:50", class = "factor"), V21 = structure(1L, .Label = "ADMINISTRATOR", class = "factor"), V22 = 1L, V23 = NA, V24 = NA, V25 = structure(1L, .Label = "1,20101E+27", class = "factor"), V26 = structure(1L, .Label = "05.07.2010 14:50", class = "factor"), V27 = structure(1L, .Label = "ADMINISTRATOR", class = "factor"), V28 = 1L, V29 = NA, V30 = NA, V31 = structure(1L, .Label = "1,20101E+27", class = "factor"), V32 = structure(1L, .Label = "05.07.2010 14:50", class = "factor"), V33 = structure(1L, .Label = "ADMINISTRATOR", class = "factor"), V34 = 1L, V35 = NA, V36 = NA, V37 = NA, V38 = NA, V39 = NA, V40 = NA, V41 = NA, V42 = NA, V43 = NA, V44 = NA, V45 = NA, V46 = NA, V47 = NA, V48 = NA, V49 = NA, V50 = NA, V51 = NA, V52 = NA, V53 = NA, V54 = NA, V55 = NA, V56 = NA), .Names = c("V1", "V2", "V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10", "V11", "V12", "V13", "V14", "V15", "V16", "V17", "V18", "V19", "V20", "V21", "V22", "V23", "V24", "V25", "V26", "V27", "V28", "V29", "V30", "V31", "V32", "V33", "V34", "V35", "V36", "V37", "V38", "V39", "V40", "V41", "V42", "V43", "V44", "V45", "V46", "V47", "V48", "V49", "V50", "V51", "V52", "V53", "V54", "V55", "V56" ), row.names = 1L, class = "data.frame")

+1  A: 

Try:

x1 <- seq(from=1, to=ncol(df)-1, by=6)
x2 <- seq(from=6, to=ncol(df), by=6)

dfnew <- data.frame("V1"=0,"V2"=0,"V3"=0,"V4"=0,"V5"=0,"V6"=0)

for(x in 1:(ncol(df)/6)) {
tmpdf <- df[x1[x]:x2[x]]
colnames(tmpdf) <- colnames(dfnew)
dfnew <- rbind(dfnew,tmpdf)
}
Brandon Bertelsen
yes, this works great. Well I tried it for the entire day already. Great Help Thanks.
Sebastian
A: 

Here's a simple loop to do the work for you:

First, dummy data

> set.seed(123)
> DF <- data.frame(matrix(rnorm(5*6*6), ncol = 36))
> names(DF) <- paste(rep(1:6, each = 6), "V", rep(1:6, times = 6), sep = "")
> names(DF)
 [1] "1V1" "1V2" "1V3" "1V4" "1V5" "1V6" "2V1" "2V2" "2V3" "2V4" "2V5" "2V6"
[13] "3V1" "3V2" "3V3" "3V4" "3V5" "3V6" "4V1" "4V2" "4V3" "4V4" "4V5" "4V6"
[25] "5V1" "5V2" "5V3" "5V4" "5V5" "5V6" "6V1" "6V2" "6V3" "6V4" "6V5" "6V6"

Now set up the loop so that at each stage we take the i, i+6, i+(2*6), ... cols of the data frame and stack them in a vector into the new data frame DF2

> n <- 6 ## number of groups of 6
> DF2 <- data.frame(matrix(NA, ncol = 6, nrow = 6 * nrow(DF)))
> for(i in seq_len(n)) {
+     DF2[[i]] <- unlist(DF[, seq(i, n*6, by = 6)])
+ }
> names(DF2) <- paste("V", seq_len(n), sep = "")
> head(DF2)
           V1         V2         V3         V4         V5         V6
1 -0.56047565  1.7150650  1.2240818  1.7869131 -1.0678237 -1.6866933
2 -0.23017749  0.4609162  0.3598138  0.4978505 -0.2179749  0.8377870
3  1.55870831 -1.2650612  0.4007715 -1.9666172 -1.0260044  0.1533731
4  0.07050839 -0.6868529  0.1106827  0.7013559 -0.7288912 -1.1381369
5  0.12928774 -0.4456620 -0.5558411 -0.4727914 -0.6250393  1.2538149
6  0.42646422  0.6886403 -0.6947070 -1.1231086  0.2533185  1.5164706

This presumes that there are only ever 6 variables, but n controls the number of sets of 6 you have.

Gavin Simpson
Not sure if I am right but I assume it must be ncol(DF) instead of nrow(DF) inDF2 <- data.frame(matrix(NA, ncol = 6, nrow = 6 * ncol(DF))). With the provided sample matirx it works but with my data it does not since there is also non numeric content.
Sebastian
@SebM: No, it needs to be `6 * nrow(DF)`. In your example, if I understood it properly, then if you have 5 rows in the original structure, and `n == 6` is the number of 'sets' of data (or number of variables) then you have 6 * 5 rows in the data structure you want. I suspect it is failing because of the non-numeric content, but as you didn't mention this in your post nor give us example data it is a bit difficult second guessing your needs.
Gavin Simpson
@SebM: If the non-numeric stuff is not needed (i.e. not part of the V1, V2 etc) then why not exclude it first? `oldDF <- DF` followed by `DF <- DF[, -cols]` where `cols` contains the indices of the non-numeric columns. Then run through the loop.
Gavin Simpson
Yes you are right, now I got it. Actually, I have much more lines ;) but its easy to read it out of the data frame. Well I think its a good solution too. Unfortunately, I need the non numeric data for further analysis as well. @UCFAGLS Could you try to explain me how lists work? I mean obviously the dataframe is split up into several list elements DF[[i]]. Why are these elements list elements and not vector type data? it seems to me that with lists you can not work in the same way as with indexed vectors (see unlist command - how is the order of the unlisted elements then in DF?)
Sebastian
@SebM; Do you want me to explain how my solution works or the more general query? Note that `DF[[i]]` is a vector but `DF[i]` is a list of one component, if `DF` is data frame. The order of elements in the `unlist`-ed is as if we took the selected columns in turn and concatenated them =, from first to last. Try this to see what I mean: `unlist(data.frame(matrix(1:9, ncol = 3)))`. So my solution creates the stacked cols from DF that represent a single variable, and inserts them into `DF2` as a vector (as `[[` extracts/replaces the vector in `DF2` not a list containing a single vector.
Gavin Simpson
OK, yes Thanks. I guess I got an idea how the R use of DF[[]] and DF[] of dataframes is. That make things clearer. So the brackets [] of a data frame will always result in a list.
Sebastian
@Sebastian; well, `DF[1]` returns a data frame with a single component (column). `class(DF[1])` shows this. As data frames are special types of list then `DF[1]` could be thought of as a list also (try `typeof(DF[1])`). This is being bit picky though. So a simple answer to your Q would be yes. For a list or a data frame, `[` will return a list or a data frame respectively.
Gavin Simpson
Maybe I am turning in circles. But so far I thought Data Frames containing a list of variables which then allows to access different data types such nominal categorical or numeric under one name e.g. DF (what is called data frame (or data structure)). So for me a list is a subcomponent of a data frame. So I am not quite sure how a data frame can be a list. I only can imagine a list which contains several data frames or lets say one element of a list may be a data frame (or a vector or a matrix...). So in my example DF is a data frame and by[] i am accessing the variables in list form of DF.
Sebastian
A data frame is a special type of list, one in which the elements all have the same "size" (length if vectors, row size if matrices). The individual elements of a data frame can, just like a list, be of different types. A list is a generalization of the data frame. The components can contain anything (data frame components can only contain a few types of R objects), and can be of any length. So, a data frame **is** a list but a list can **not** necessarily become a data.frame.
Gavin Simpson
On your last point, not really. Your DF is a data frame. When you use `[` it returns a data frame. e.g. `DF[1]` returns a data frame with a single component (column), whereas `DF[[1]]` will return directly the contents of the first component (column) of DF. To be clear, assume `DF` is generated by `DF <- data.frame(A = 1:10, B = 1:10)`, `DF[1]` will return a data.frame with the data for `A` in it, whilst `DF[[1]]` will return will return a numeric vector.
Gavin Simpson
For a list `L` defined by `L <- list(A = "foo", B = 1:5)`, `L[1]` will return a list with a single component that **contains** a character vector of length 1. `L[2]` returns a list that **contains** a numeric vector of length 5. `L[[2]]` will return a numeric vector of length 5.
Gavin Simpson
OK. Thanks for the explanation. I got it. (finally) Thanks. Hope I can operate with it well. :) Thanks again.
Sebastian