ansaurus

Question

Merge several data.frames into one data.frame with a loop

Answer 1

+12 A:

You may want to look at the closely related question on stackoverflow.

I would approach this in two steps: import all the data (with plyr), then merge it together:

filenames <- list.files(path=".../tempDataFolder/", full.names=TRUE)
library(plyr)
import.list <- llply(filenames, read.csv)

That will give you a list of all the files that you now need to merge together. There are many ways to do this, but here's one approach (with Reduce):

data <- Reduce(function(x, y) merge(x, y, all=T, 
    by=c("COUNTRYNAME", "COUNTRYCODE", "Year")), import.list, accumulate=F)

Alternatively, you can do this with the reshape package if you aren't comfortable with Reduce:

library(reshape)
data <- merge_recurse(import.list)

Shane 2010-02-05 18:20:28

@shane: i like the approach using the `reshape` package. i have to take another look into `plyr` and `reshape`. thanks! one small thing, in the first line of code, `full.names=TRUE` has to be added.

mropa 2010-02-05 19:50:19

Thanks; corrected that.

Shane 2010-02-05 20:13:39

Answer 2

+1 A:

If I'm not mistaken, a pretty simple change could eliminate the 3:length(FileNames) kludge:

FileNames <- list.files(path=".../tempDataFolder/", full.names=TRUE)
dataMerge <- data.frame()
for(f in FileNames){ 
  ReadInMerge <- read.csv(file=f, header=T, na.strings="NULL")
  dataMerge <- merge(dataMerge, ReadInMerge, 
               by=c("COUNTRYNAME", "COUNTRYCODE", "Year"), all=T)
}

Ken Williams 2010-02-05 18:56:25

@ken: since the `dataMerge` is an empty `data.frame` the `merge()` function cannot find an common identifier in the first `for` loop. if i assign eg the first file to `dataMerge` it kind of gets me back to my initial idea.

mropa 2010-02-05 19:53:08

Sorry, I should have tried it first. I was thinking of rbind(), in which an empty data frame is treated as if the required columns are present but empty.

Ken Williams 2010-02-08 16:04:43

ansaurus

tags:

views:

answers:

Merge several data.frames into one data.frame with a loop

related questions