views:

280

answers:

5

If i have a matrix A with n values spanning from 65:90. How do i get the 10 most common values in A? I want the result to be a 10x2 matrix B with the 10 common values in the first column and the times it appears in the second column.

+1  A: 

This is easily solved using arrayfun()

A = [...]; % Your target matrix with values 65:90
labels = 65:90 % Possible values to look for
nTimesOccured = arrayfun(@(x) sum(A(:) == x), labels);
[sorted sortidx] = sort(nTimesOccured, 'descend');

B = [labels(sortidx(1:10))' sorted(1:10)'];
kigurai
+4  A: 
A = [65 82 65 90; 90 70 72 82]; % Your data
range = 65:90;
res = [range; histc(A(:)', range)]'; % res has values in first column, counts in second.

Now all you’ve got to do is sort the res array by the second column and take the first 10 rows.

sortedres = sortrows(res, -2); % sort by second column, descending
first10 = sortedres(1:10, :)
Debilski
You will want to use values(:) since A is a matrix and not neccessarily a vector.Also, sortrows can easily be made to sort descending by using sortrows(res, -2) instead.
kigurai
You are right. I fixed it.
Debilski
A: 

this can also be solved with accumarray

ncounts = accumarray(A(:),1);  %ncounts should now be a 90 x 1 vector of counts
[vals,sidx] = sort(ncounts,'descend');   %vals has the counts, sidx has the number
B = [sidx(1:10),vals(1:10)];

accumarray is not as fast as it should be, but often faster than other operations of its type. it took me a number of scans of its help page to understand what the hell it is doing. for your purposes, it is probably slower than the histc solution, but a little more straight-forward.

--edit: forgot the '1' in the accumarray call.

shabbychef
That is NOT the correct way to use **accumarray**! take a look at this video by *Doug Hull* which shows a typical usage of the function: http://blogs.mathworks.com/videos/2009/10/02/basics-using-accumarray/
Amro
yes, I forgot the 1. however, this is the essence of accumarray. I think of it as the fast, well-defined way of doing output(idx) += vals. your comments notwithstanding, this is the correct way to use accumarray.
shabbychef
+1  A: 

We can add a fourth option using tabulate from the Statistics Toolbox:

A = randi([65 90], [1000 1]);   %# thousand random integers in the range 65:90
t = sortrows(tabulate(A), -2);  %# compute sorted frequency table
B = t(1:10, 1:2);               %# take the top 10
Amro
A: 

Heck, here is another solution, all simple builtin commands

[V, I] = unique(sort(A(:)));
M = sortrows([V, diff([0; I])], -2);
Top10 = M(1:10, :);

First line: sorts all values, and then looks for the offset where each new values starts in the sorted list. Second line: compute the offset differences per unique value, and sort those results.

BTW, I would only suggest this method if the range possible numbers is really large, such as [0,1E8]. In that case, some of the other methods might get an out-of-memory error.

catchmeifyoutry

related questions