views:

575

answers:

3

I'm trying to figure out how to remove an element of a matrix in MATLAB if it differs from any of the other elements by 0.01. I'm supposed to be using all of the unique elements of the matrix as thresholding values for a ROC curve that I'm creating but I need a way to remove values when they are within 0.01 of each other (since we are assuming they are basically equal if this is true).

And help would be greatly appreciated!

Thanks!

A: 

Let all the elements in your matrix form a graph G = (V,E) such that an there is an edge between two vertices (u,v) if the difference between them is less than 0.01. Now, construct an adjacency matrix for this graph and find the element with the largest degree. Remove it and add it to a list and remove it's neighbors from your graph and repeat until there aren't any elements left.

CODE:

 %% Toy dataset
    M = [1 1.005 2 ;2.005 2.009 3; 3.01 3.001 3.005];
    M = M(:);
    A = false(numel(M),numel(M));
    for i=1:numel(M)
        ind = abs(M-M(i))<=0.01;
        A(i,ind) = 1;
    end
    C = [];
    while any(A(:))
        [val ind] = max(sum(A));
        C(end+1) = M(ind);
        A(A(ind,:),:) = 0;
    end

This has a runtime of O(n^2) where your matrix has n elements. Yeah it's slow.

Jacob
+2  A: 

If you are simply trying to remove adjacent values within that tolerance from a vector, I would start with something like this:

roc = ...

tolerance = 0.1;
idx = [logical(1) diff(roc)>tolerance)];
rocReduced = roc(idx);

'rocReduced' is now a vector with all values that didn't have an adjacent values within a tolerance in the original vector.

This approach has two distinct limitations:

  1. The original 'roc' vector must be monotonic.
  2. No more than two items in a row may be within the tolerance, otherwise the entire swath will be removed.

I suspect the above would not be sufficient. That said, I can't think of any simple operations that overcome those (and other) limitations while still using vectorized matrix operations.

If performance is not a huge issue, you maybe the following iterative algorithm would suit your application:

roc = ...
tolerance = 0.1;
mask = true(size(roc)); % Start with all points
last = 1; % Always taking first point
for i=2:length(roc) % for all remaining points,
  if(abs(roc(i)-roc(last))<tolerance) % If this point is within the tolerance of the last accepted point, remove it from the mask;
    mask(i) = false;
  else % Otherwise, keep it and mark the last kept
    last = i;
  end
end
rocReduced = roc(mask);

This handles multiple consecutive sub-tolerance intervals without necessarily throwing all away. It also handles non-monotonic sequences.

MATLAB users sometimes shy away from iterative solutions (vs. vectorized matrix operations), but sometimes it's not worth the trouble of finding a more elegant solution when brute force performance meets your needs.

Adam
A: 

From your description, it's not very clear how you want to handle a chain of values (as pointed out in the comments already), e.g. 0.0 0.05 0.1 0.15 ... and what you actually mean by removing the elements from the matrix: set them to zero, remove the entire column, remove the entire line?

For a vector, it could look like (similar to Adams solution)

roc = ...
tolerance = 0.1;

% sort it first to get the similar values in a row
[rocSorted, sortIdx] = sort(roc);

% find the differing values and get their indices
idx = [logical(1); diff(rocSorted)>tolerance)];
sortIdxReduced = sortIdx(idx);

% select only the relevant parts from the original vector (revert sorting)
rocReduced = roc(sort(sortIdxReduced));

The code is untested, but should work hopefully.

groovingandi

related questions