I am using plyr package in R to do the following:
- pick up a row from table A according to column A and column B
- find the row from table B having the same value in column A and column B
- copy column C from table B to table A
I have made the progress bar to show the progress, but after it shows to 100% it seems to be still running, as I have see my CPU is still occupied by RGUI, but it just doesn't end.
My table A is having about 40000 rows of data with unique column A and column B.
I suspect that the "combine" part of the "split-conquer-combine" workflow in plyr cannot handle this 40000 rows of data, because I can do it for another table with 4000 rows of data.
Any suggestions for improving the efficiency? Thanks.
UPDATE
Here is my code:
for (loop.filename in (1:nrow(filename)))
{print("infection source merge")
print(filename[loop.filename, "table_name"])
temp <- get(filename[loop.filename, "table_name"])
temp1 <- ddply(temp,
c("HOSP_NO", "REF_DATE"),
function(df)
{temp.infection.source <- abcde[abcde[,"Case_Number"]==unique(df[,"HOSP_NO"]) &
abcde[,"Reference_Date"]==unique(df[,"REF_DATE"]),
"Case_Definition"]
if (length(temp.infection.source)==0) {
temp.infection.source<-"NIL"
} else {
if (length(unique(temp.infection.source))>1) {
temp.infection.source<-"MULTIPLE"
} else {
temp.infection.source<-unique(temp.infection.source)}}
data.frame(df,
INFECTION_SOURCE=temp.infection.source)
},
.progress="text")
assign(filename[loop.filename, "table_name"], temp1)
}