tags:

views:

52

answers:

1

Computing a running average of a simple 1-D data vector seems simple enough. Indeed, the MATLAB documentation for FILTER happily claims something like:

You can use filter to find a running average without using a for loop. This example finds the running average of a 16-element vector, using a window size of 3:

D = [1:0.2:4]';
windowSize = 3;
F = ones(1,windowSize)/windowSize;
Df = filter(F,1,D);

The result:

Image of raw and filtered data plot from above example

For my purposes, there are two annoying things about this result: output point n is the average of input points n-(windowSize-1)..n (i.e. not centered, as evidenced by the horizontal shift) and points to the left of the available data are treated as zeros.

FILTFILT deals with both issues, but has other drawbacks. It's part of the Signal Processing Toolbox, and it doesn't deal well with NaNs (which I'd like excluded from the mean).

Some people on FEX obviously had the same frustrations, but it seems odd to me that something this simple requires custom code. Anything I'm missing here?

+3  A: 

You can also do a running average using convolution. Thus, you don't need to worry about filtfilt.

For example, you could use

D = [1:0.2:4];
windowSize = 3;
F = ones(1,windowSize)/windowSize;
Df = conv(D,F);
%# if you didn't use 'valid', Df is larger than D. To correct:
halfSize = floor(windowSize/2);
Df = Df(halfSize+1:end-halfSize);

Of course, you'd still have to deal with the edge, so you should pad D first, or run conv with the 'valid' argument. For example, you could use PADARRAY if you have the image processing toolbox.

The simplest way to pad would be to replicate the first and last values. If you know more about your data, other approaches can turn out to be more suited.

Jonas
true.. but if you want the result to be the mean of the available data (so for windowSize = 3, output for the first data point is the mean of points 1 and 2), padding gets tricky! and conv.m also seems to have the lag if you plot the example you give. of course it's all doable but again seems more work than it should be!
Matt Mizumi
ah, the 'valid' argument is useful -- looks like it's time to upgrade my MATLAB, my current version doesn't have it!
Matt Mizumi
@Matt: Df, by default, is larger than D. If you `plot(Df(2:end-1))`, you'll see the edge effect, but there is no lag.
Jonas
Hmm.. is there even a way to do the padding and get the average? Is seems that if say 3 bins are falling off the edge compared to 1, you would need to pad with different values in order to get the correct average for the first bin. Specifically, you need to pad with the average of the valid bins, which depends on the point under consideration. So I'm not sure this is even possible with padding!
Matt Mizumi
@Matt: Do you know what kind of function describes your data? If your data are supposed to be constant (but are noisy), you can simply mirror the data. If you know the data are linear, you can pad with `2*data(1)-data(2:1+halfSize)`, which would work perfectly for the sample data you gave in the OP.
Jonas

related questions