views:

153

answers:

5

I'm coding something at the moment where I'm taking a bunch of values over time from a hardware compass. This compass is very accurate and updates very often, with the result that if it jiggles slightly, I end up with the odd value that's wildly inconsistent with its neighbours. I want to smooth those values out.

Having done some reading around, it would appear that what I want is a high-pass filter, a low-pass filter or a moving average. Moving average I can get down with, just keep a history of the last 5 values or whatever, and use the average of those values downstream in my code where I was once just using the most recent value.

That should, I think, smooth out those jiggles nicely, but it strikes me that it's probably quite inefficient, and this is probably one of those Known Problems to Proper Programmers to which there's a really neat Clever Math solution.

I am, however, one of those awful self-taught programmers without a shred of formal education in anything even vaguely related to CompSci or Math. Reading around a bit suggests that this may be a high or low pass filter, but I can't find anything that explains in terms comprehensible to a hack like me what the effect of these algorithms would be on an array of values, let alone how the math works. The answer given here, for instance, technically does answer my question, but only in terms comprehensible to those who would probably already know how to solve the problem.

It would be a very lovely and clever person indeed who could explain the sort of problem this is, and how the solutions work, in terms understandable to an Arts graduate.

+1  A: 

I think that Wikipedia explains these quite well.

High Performance Mark
+2  A: 

Moving average I can get down with ... but it strikes me that it's probably quite inefficient.

There's really no reason a moving average should be inefficient. You keep the number of data points you want in some buffer (like a circular queue). On each new data point, you pop the oldest value and subtract it from a sum, and push the newest and add it to the sum. So every new data point really only entails a pop/push, an addition and a subtraction. Your moving average is always this shifting sum divided by the number of values in your buffer.

It gets a little trickier if you're receiving data concurrently from multiple threads, but since your data is coming from a hardware device that seems highly doubtful to me.

Oh and also: awful self-taught programmers unite! ;)

Dan Tao
Henry Cooke
@Henry: For a straight-up moving average you do need the buffer simply so that you know what value gets popped when the next value get pushed. That said, the "fixed-length queue )
Dan Tao
@Dan, no, I thought you meant a actual high-level array.push() / array.unshift() or something, like an AS3 or Java programmer would do. Forgive me. Arts graduate, remember?
Henry Cooke
@Henry: I don't know about AS3, but a Java programmer's got collections like [`CircularQueue`](http://www.java2s.com/Code/Java/Collections-Data-Structure/CircularQueue.htm) at his/her disposal (I'm not a Java developer so I'm sure there are better examples out there; that's just what I found from a quick Google search), which implements precisely the functionality we're talking about. I'm fairly confident the majority of medium- and low-level languages with standard libraries have something similar (e.g., in .NET there's `Queue<T>`). Anyway, I was philosophy myself, so... all is forgiven.
Dan Tao
+6  A: 

If you are trying to remove the occasional odd value, a low-pass filter is the best of the three options that you have identified. Low-pass filters allow low-speed changes such as the ones caused by rotating a compass by hand, while rejecting high-speed changes such as the ones caused by bumps on the road, for example.

A moving average will probably not be sufficient, since the effects of a single "blip" in your data will affect several subsequent values, depending on the size of your moving average window.

If the odd values are easily detected, you may even be better off with a glitch-removal algorithm that completely ignores them:

if (abs(thisValue - averageOfLast10Values) > someThreshold)
{
    thisValue = averageOfLast10Values;
}

Here is a guick graph to illustrate:

graph comparison

The first graph is the input signal, with one unpleasant glitch. The second graph shows the effect of a 10-sample moving average. The final graph is a combination of the 10-sample average and the simple glitch detection algorithm shown above. When the glitch is detected, the 10-sample average is used instead of the actual value.

e.James
Nicely explained, and bonus points for the graph ;)
Henry Cooke
+5  A: 

If your moving average has to be long in order to achieve the required smoothing, and you don't really need any particular shape of kernel, then you're better off if you use an exponentially decaying moving average:

a(i+1) = tiny*data(i+1) + (1.0-tiny)*a(i)

where you choose tiny to be an appropriate constant (e.g. if you choose tiny = 1- 1/N, it will have the same amount of averaging as a window of size N, but distributed differently over older points).

Anyway, since the next value of the moving average depends only on the previous one and your data, you don't have to keep a queue or anything. And you can think of this as doing something like, "Well, I've got a new point, but I don't really trust it, so I'm going to keep 80% of my old estimate of the measurement, and only trust this new data point 20%". That's pretty much the same as saying, "Well, I only trust this new point 20%, and I'll use 4 other points that I trust the same amount", except that instead of explicitly taking the 4 other points, you're assuming that the averaging you did last time was sensible so you can use your previous work.

Rex Kerr
Good explanation, thanks Rex. Would I be right in thinking that another way of expressing the same thing would be: workingAverage = (newValue*smoothingFactor) + ( workingAverage * ( 1.0 - smoothingFactor) )?
Henry Cooke
@Henry - Right, that's the way to do it without using any extra storage.
Rex Kerr
+1  A: 

An exponentially decaying moving average can be calculated "by hand" with only the trend if you use the proper values. See http://www.fourmilab.ch/hackdiet/e4/ for an idea on how to do this quickly with a pen and paper if you are looking for “exponentially smoothed moving average with 10% smoothing”. But since you have a computer, you probably want to be doing binary shifting as opposed to decimal shifting ;)

This way, all you need is a variable for your current value and one for the average. The next average can then be calculated from that.

Daren Thomas
Also, The Hacker's Diet is a good read for anyone looking for common engineering sense in the dieting world of crazies...
Daren Thomas
Good explanation of EMA, thanks.
Henry Cooke