ansaurus

Question

R counting the occurrences of similar rows of data frame

Answer 1

+3 A:

Ok, let's first do it in the easy case where you just have one column.

> data <- rep(sample(1000, 5),
              sample(5, 5))
> head(data)
[1] 435 435 435 278 278 278

Then you can just use rle to figure out the contiguous sequences:

> sequence(rle(data)$lengths)
[1] 1 2 3 1 2 3 4 5 1 2 3 4 1 2 1

Or altogether:

> head(cbind(data, sequence(rle(data)$lengths)))
[1,]  435 1
[2,]  435 2
[3,]  435 3
[4,]  278 1
[5,]  278 2
[6,]  278 3

For your case with multiple columns, there are probably a bunch of ways of applying this solution. Easiest might be to just paste the columns you care about together to form a single vector.

Jonathan Chang 2010-04-03 21:52:10

Answer 2

+1 A:

Okay I used the answer I had on another question and worked out a loop that I think will work. This is what I'm going to use:

cmpfun2 <- function(r) {
    count <- 0
    if (r[1] > 1)
    {
        for (row in 1:(r[1]-1))
        {
            if(all(r[27:51] == DF[row,27:51,drop=FALSE]))  # compare to row bind
            {
                count <- count + 1
            }
        }
    }
    return (count)
}
brows <- apply(DF[], 1, cmpfun2)
print(brows)

Please comment if I made a mistake and this won't work, but I think I've figured it out. Thanks!

Matt 2010-04-03 22:36:29

Your solution looks OK but if I were you I'd at least try to understand Jonathan's answer above. RLE is often a quicker and more robust alternative compared to loops and index gymnastics.

Leo Alekseyev 2010-04-04 02:33:28

Answer 3

A:

had the same problem, it gets especially annoying with large data.frames. My latest solution is to creat a subset with the unique observations and then deleting the rows in the original dataframe. Sounds stupid and overly complicted but its pretty fast and straightforward:

d<-unique(DF[, c("column1","column2")]) # select rows with unique combinations of entries for the 2 columns
d1<-(as.numeric(rownames(d)))*(-1) # make rownames numeric and negativ
d2<-DF[d1,] # matrix mit doppelten Werten # subtract rows of unique observations from original data.frame.

The new data.frame d2 has now all observations with not-unique combintations of the entries in the specified columns. I'm the biggest coder and didnt get your loop, but I think it works fine, doesnt it?

SKr 2010-10-12 13:33:32

ansaurus

tags:

views:

answers:

R counting the occurrences of similar rows of data frame

related questions