I have two data frames that both have a column containing a factor like the following:
> head(test.data)
var0 var1 date store
1 109.5678 109.5678 1990-03-30 Store1
2 109.3009 108.4261 1990-06-30 Store1
3 108.8262 106.2517 1990-09-30 Store1
4 108.2443 108.6417 1990-12-30 Store1
5 109.5678 109.5678 1991-03-30 Store1
6 109.3009 108.4261 1991-06-30 Store1
> summary(test.data)
var0 var1 date store
Min. : -9.72 Min. : -2.297 Min. :1990-03-30 Store1 : 8
1st Qu.: 68.32 1st Qu.: 71.305 1st Qu.:1990-09-07 Store2 : 8
Median :102.19 Median :101.192 Median :1991-02-13 Store3 : 8
Mean :101.09 Mean :103.042 Mean :1991-02-13 Store4 : 8
3rd Qu.:151.22 3rd Qu.:151.940 3rd Qu.:1991-07-23 Store5 : 8
Max. :196.55 Max. :201.099 Max. :1991-12-30 Store6 : 8
(Other):48
>
> head(test.clusters)
store cluster
1 Store1 A
2 Store2 C
3 Store3 A
4 Store4 B
5 Store5 D
6 Store6 A
>
> summary(test.clusters)
store cluster
Store1 :1 A:5
Store2 :1 B:4
Store3 :1 C:2
Store4 :1 D:1
Store5 :1
Store6 :1
(Other):6
>
I want to add a column to test.data
containing each row's cluster
, based on their shared store
. I am currently doing this using a doubly nested loop:
new_col <- rep(test.clusters$cluster[1], length(test.data$store))
for (i in seq(test.data$store)){
for (j in seq(test.clusters$store)){
if (test.data$store[i] == test.clusters$store[j]){
new_col[i] <- test.clusters$cluster[j]
break
}
}
}
test.data$cluster <- new_col
This is very verbose, grossly inefficient and frankly ugly. What is the idiomatic method for doing this in R?