views:

153

answers:

3

I have two arrays of data that I'm trying to amalgamate. One contains actual latencies from an experiment in the first column (e.g. 0.345, 0.455... never more than 3 decimal places), along with other data from that experiment. The other contains what is effectively a 'look up' list of latencies ranging from 0.001 to 0.500 in 0.001 increments, along with other pieces of data. Both data sets are X-by-Y doubles.

What I'm trying to do is something like...

for i = 1:length(actual_latency) 
   row = find(predicted_data(:,1) == actual_latency(i))
   full_set(i,1:4) = [actual_latency(i) other_info(i) predicted_info(row,2) ...
                      predicted_info(row,3)];
end

...in order to find the relevant row in predicted_data where the look up latency corresponds to the actual latency. I then use this to created an amalgamated data set, full_set.

I figured this would be really simple, but the find function keeps failing by throwing up an empty matrix when looking for an actual latency that I know is in predicted_data(:,1) (as I've double-checked during debugging).

Moreover, if I replace find with a for loop to do the same job, I get a similar error. It doesn't appear to be systematic - using different participant data sets throws it up in different places.

Furthermore, during debugging mode, if I use find to try and find a hard-coded value of actual_latency, it doesn't always work. Sometimes yes, sometimes no.

I'm really scratching my head over this, so if anyone has any ideas about what might be going on, I'd be really grateful.

+3  A: 

You are likely running into a problem with floating point comparisons when you do the following:

predicted_data(:,1) == actual_latency(i)

Even though your numbers appear to only have three decimal places of precision, they may still differ by very small amounts that are not being displayed, thus giving you an empty matrix since FIND can't get an exact match.

One feature of floating point numbers is that certain numbers can't be exactly represented, since they aren't an integer power of 2. This occurs with the numbers 0.1 and 0.001. If you repeatedly add or multiply one of these numbers you can see some unexpected behavior. Amro pointed out one example in his comment: 0.3 is not exactly equal to 3*0.1. This can also be illustrated by creating your look-up list of latencies in two different ways. You can use the normal colon syntax:

vec1 = 0.001:0.001:0.5;

Or you can use LINSPACE:

vec2 = linspace(0.001,0.5,500);

You'd think these two vectors would be equal to one another, but think again!:

>> isequal(vec1,vec2)

ans =

     0  %# FALSE!

This is because the two methods create the vectors by performing successive additions or multiplications of 0.001 in different ways, giving ever so slightly different values for some entries in the vector. You can take a look at this technical solution for more details.

When comparing floating point numbers, you should therefore do your comparisons using some tolerance. For example, this finds the indices of entries in the look-up list that are within 0.0001 of your actual latency:

tolerance = 0.0001;
for i = 1:length(actual_latency)
  row = find(abs(predicted_data(:,1) - actual_latency(i)) < tolerance);
  ...

The topic of floating point comparison is also covered in this related question.

gnovice
An easy example I like to give about floating point comparison is: `(0.3 == 0.1*3)` [this will evaluate to false!]
Amro
That worked a treat; thanks very much!
Peter Etchells
+2  A: 

You may try to do the following:

row = find(abs(predicted_data(:,1) - actual_latency(i))) < eps)

EPS is accuracy of floating-point operation.

yuk
+1 - I didn't know about eps
Doresoom
It might be better to do someting like `abs(predicted_data(:,1) - actual_latency(i)) < 3*eps(predicted_data(:,1)` If the arguments to the `abs` call are large, they may be several `eps(1)` intervals apart.
mtrw
A: 

Have you tried using a tolerance rather than == ?

Doresoom