ansaurus

Question

R get rid of rows with duplicate attribute

Answer 1

+4 A:

subset(data,!duplicated(data$ID))

Should do the trick

James 2010-05-03 16:11:36

thanks ever so much - I was about to despair...

CatholicEvangelist 2010-05-03 16:21:25

This will work if you don't have any heuristic in mind for how to select the other data. Seems like a very strange use case to me...

Shane 2010-05-03 16:27:10

Exactly what I just needed James, thank you.

Tal Galili 2010-06-13 15:16:06

Answer 2

+2 A:

If you want to keep one row for each ID, but there is different data on each row, then you need to decide on some logic to discard the additional rows. For instance:

df <- data.frame(ID=c(1, 2, 2, 3), time=1:4, OS="Linux")
df
  ID time    OS
1  1    1 Linux
2  2    2 Linux
3  2    3 Linux
4  3    4 Linux

Now I will keep the maximum time value and the last OS value:

library(plyr)
unique(ddply(df, .(ID), function(x) data.frame(ID=x[,"ID"], time=max(x$time), OS=tail(x$OS,1))))
  ID time    OS
1  1    1 Linux
2  2    3 Linux
4  3    4 Linux

Shane 2010-05-03 16:13:05

thanks a lot for the detailed answer!!!

CatholicEvangelist 2010-05-03 16:21:50

Answer 3

A:

Hi Shane,

Could you possibly describe what the function is doing? I have a similar problem to CatholicEvangelist, but it is a bit more complex and I think understanding yours would be helpful.

Thanks, Lauren

Lauren 2010-06-17 15:41:24

ansaurus

tags:

views:

answers:

R get rid of rows with duplicate attribute

related questions