I have a rectangular n x m matrix (n != m). What's the best way to find out if there are any duplicate rows in it in Matlab? What's the best way to find the indices of the duplicates?
A:
Run through the rows of the matrix, and for each pair, test if
row1 == row2
John at CashCommons
2010-03-24 17:49:23
This works, but is definitely both slower and more verbose than the other basic option (i.e. using 'unique()').
bnaul
2010-03-24 18:02:05
+6
A:
Use unique() to find the distinct row values. If you end up with fewer rows, there are duplicates. It'll also give you indexes of one location of each of the distinct values. All the other row indexes are your duplicates.
x = [
1 1
2 2
3 3
4 4
2 2
3 3
3 3
];
[u,I,J] = unique(x, 'rows', 'first')
hasDuplicates = size(u,1) < size(x,1)
ixDupRows = setdiff(1:size(x,1), I)
dupRowValues = x(ixDupRows,:)
Andrew Janke
2010-03-24 17:56:59
+2
A:
You can use the functions UNIQUE and SETDIFF to accomplish this:
>> mat = [1 2 3; 4 5 6; 7 8 9; 7 8 9; 1 2 3]; %# Sample matrix
>> [newmat,index] = unique(mat,'rows','first'); %# Finds indices of unique rows
>> repeatedIndex = setdiff(1:size(mat,1),index) %# Finds indices of repeats
repeatedIndex =
4 5
gnovice
2010-03-24 17:57:48
A:
Say your matrix is M:
[S,idx1] = sortrows(M);
idx2 = find(all(diff(S,1) == 0,2));
out = unique(idx1([idx2;idx2+1]));
out will contain the duplicate row indices if any.
upperBound
2010-03-24 18:14:51
@upperBound: Well, technically the OP never *explicitly* said whether or not the duplicated rows abut one another. Although not as general as using UNIQUE, this solution runs *substantially* faster in the specific case of neighboring duplicates, so +1.
gnovice
2010-03-24 18:37:22
@upperBound: Well, your new answer is doing something that I don't think the OP wanted. It is returning the indices of *all* rows that are not unique. I think the OP just wanted indices of duplicates *not counting* the first one found. In other words, if rows 2, 4, and 5 are the same, then rows 4 and 5 are considered "duplicates", with row 2 being the "original" (or 2 and 4 could be counted as duplicates, with 5 as the original... there was no order specified by the OP).
gnovice
2010-03-24 19:38:12