views:

220

answers:

2

I have a data file m.txt that looks something like this (with a lot more points):

286.842995
3.444398
3.707202
338.227797
3.597597
283.740414
3.514729
3.512116
3.744235
3.365461
3.384880

Some of the values (like 338.227797) are very different from the values I generally expect (smaller numbers).

  • So, I am thinking that I will remove all the points that lie outside the 3-sigma range. How can I do that in MATLAB?

  • Also, the bigger problem is that this file has a separate file t.txt associated with it which stores the corresponding time values for these numbers. So, I'll have to remove the corresponding time values from the t.txt file also.

I am still learning MATLAB, and I know there would be some good way of doing this (better than storing indices of the elements that were removed from m.txt and then removing those elements from the t.txt file)

+3  A: 
%# load files
m = load('m.txt');
t = load('t.txt');

%# find outliers indices
z = 3;
idx = find( abs(m-mean(m)) > z*std(m) );

%# remove them from both data and time values
m(idx) = [];
t(idx) = [];
Amro
no need for complicated methods, sometimes simple solutions are best!
Amro
Also, you might find this answer to a similar question useful:http://stackoverflow.com/questions/1636683/find-only-relevant-points-in-matlab/1640298#1640298
Amro
+4  A: 

@Amro is close, but the FIND is unnecessary (look up logical subscripting) and you need to include the mean for a true +/-3 sigma range. I would go with the following:

%# load files 
m = load('m.txt'); 
t = load('t.txt'); 

%# find values within range
z = 3;
meanM = mean(m);
sigmaM = std(m);
I = abs(m - meanM) <= z * sigmaM;

%# keep values within range
m = m(I);
t = t(I);
Nzbuu
It may be more natural to reverse the inequality in I, and keep the values you want, i.e. `I = abs(m - meanM) < z * sigmaM; M = M(I); t = t(I);`
Richie Cotton
@Nzbuu: of course, thank for catching it!
Amro
@Richie. True. I'll change it.
Nzbuu