tags:

views:

75

answers:

2

I'm trying to figure out how to use merge() to update a database.

Here is an example. Take for example the data frame foo

foo <- data.frame(index=c('a', 'b', 'c', 'd'), value=c(100, 101, NA, NA))

Which has the following values

index value
1     a   100
2     b   101
3     c    NA
4     d    NA

And the data frame bar

bar <- data.frame(index=c('c', 'd'), value=c(200, 201))

Which has the following values:

 index value
1     c   200
2     d   201

When I run the following merge() function to update the values for c and d

merge(foo, bar, by='index', all=T)

It results in this output:

 index value.x value.y
1     a     100      NA
2     b     101      NA
3     c      NA     200
4     d      NA     201

I'd like the output of merge() to avoid the creation of, in this specific example, of value.x and value.y but only retain the original column of value Is there a simple way of doing this?

+2  A: 

Doesn't merge() always bind columns together? Does replace() work?

foo$value <- replace(foo$value, foo$index %in% bar$index, bar$value)

or match() so the order matters

foo$value[match(bar$index, foo$index)] <- bar$value
apeescape
One wrinkle with using `replace()` is that if the ordering in `bar` is not the same as in `foo`, it won't work properly. For example, if you try running the above example after `bar <- bar[c(2,1),]`, the end result does not come out correct.
andrewj
you're right how about match()? edited above
apeescape
Yes, `match()` does work for my example. In reality, it turns out that my actual use case is more complicated, where I would like to match across multiple columns and not just a simple vector. I don't think `match()` works when you would like to match across multiple columns of a dataframe.
andrewj
A: 

merge() only merges in new data. For instance, if you had a data set of average income for a few cities, and a separate data set of the populations of those cities, you would use merge() to merge in one set of data into the other.

Like apeescape said, replace() is probably what you want.

JoFrhwld