ansaurus

Question

Analyzing noisy data

Answer 1

+3 A:

You could try running the data through a low-pass filter. This will smooth out high frequency noise. Maybe a simple FIR.

Also, you could pull your major events from the raw data, but use a polynomial fit for velocity and acceleration data.

Rob Curtis 2009-12-24 06:07:40

I like your comment about the polynomial fit. Perhaps the flight could be divided in two: before the thrust is finished and after. After the thrust a parabola would be a natural fit for a polynomial and before that some polynomial with a slightly higher order (3 or 4?). Extreme values such as the early "229" should become outliers and disappear.

2009-12-25 16:37:43

This answer is right on track. It just needs a specific name to look up... since acceleration and velocity are derivatives respect to time, you should look into Savitzky-Golay. It's described in Numerical Recipes and online in many places. Defined as a low-order polynomial fit at each point, it smooths the data and takes a derivative order as a parameter. This is better numerically than smoothing and then differentiating in separate steps. S.G. is especially good at preserving peaks, inflection points etc whereas naive attempts at smoothing typically mush out peaks and other fine detail.

DarenW 2010-02-12 15:40:48

Answer 2

+1 A:

One way you can approach analyzing you data is to try to match it too some model, generate a function, and then test its fitness to your data set.... This can be rather complicated and is probably unnecessary... but the point is that instead of generating acceleration/velocity data directly from you data you can match it to your model (rather simple for a rocket, some acceleration upwards followed by a slow constant speed descent.) At least that how i would do it in a physics experiment.

As for generating some sense of velocity and acceleration during flight this should be as simple averaging the velocity from several different results. Something along the lines of: EsitimatedV = Vmeasured*(1/n) + (1 - 1/n)*EstimatedV. Set n based on how quickly you want your velocity to adjust by.

Thomas Sidoti 2009-12-24 06:13:45

Answer 3

+7 A:

Here is my solution, using a Kalman filter. You will need to tune the parameters (even +- orders of magnitude) if you want to smooth more or less.

#!/usr/bin/env octave

% Kalman filter to smooth measures of altitude and estimate
% speed and acceleration. The continuous time model is more or less as follows:
% derivative of altitude := speed
% derivative of speed := acceleration
% acceleration is a Wiener process

%------------------------------------------------------------
% Discretization of the continuous-time linear system
% 
%   d  |x|   | 0 1 0 | |x|
%  --- |v| = | 0 0 1 | |v|   + "noise"
%   dt |a|   | 0 0 0 | |a|
%
%   y = [1 0 0] |x|     + "measurement noise"
%               |v|
%               |a|
%
st = 0.05;    % Sampling time
A = [1  st st^2/2;
     0  1  st    ;
     0  0  1];
C = [1 0 0];

%------------------------------------------------------------
% Fine-tune these parameters! (in particular qa and R)
% The acceleration follows a "random walk". The greater is the variance qa,
% the more "reactive" the system is expected to be, i.e.
% the more the acceleration is expected to vary
% The greater is R, the more noisy is your measurement instrument
% (less "accuracy" of the barometric altimeter);
% if you increase R, you will smooth the estimate more
qx = 1.0;                      % Variance of model noise for position
qv = 1.0;                      % Variance of model noise for speed
qa = 50.0;                     % Variance of model noise for acceleration
Q  = diag([qx, qv, qa]);
R  = 100.0;                    % Variance of measurement noise
                               % (10^2, if 10ft is the standard deviation)

load data.txt  % Put your measures in this file

est_position     = zeros(length(data), 1);
est_speed        = zeros(length(data), 1);
est_acceleration = zeros(length(data), 1);

%------------------------------------------------------------
% Kalman filter
xhat = [0;0;0];     % Initial estimate
P    = zeros(3,3);  % Initial error variance
for i=1:length(data),
   y = data(i);
   xpred = A*xhat;                                    % Prediction
   Ppred = A*P*A' + Q;                                % Prediction error variance
   Lambdainv = 1/(C*Ppred*C' + R);
   xhat  = xpred + Ppred*C'*Lambdainv*(y - C*xpred);  % Update estimation
   P = Ppred - Ppred*C'*Lambdainv*C*Ppred;            % Update estimation error variance
   est_position(i)     = xhat(1);
   est_speed(i)        = xhat(2);
   est_acceleration(i) = xhat(3);
end

%------------------------------------------------------------
% Plot
figure(1);
hold on;
plot(data, 'k');               % Black: real data
plot(est_position, 'b');       % Blue:  estimated position
plot(est_speed, 'g');          % Green: estimated speed
plot(est_acceleration, 'r');   % Red:   estimated acceleration
pause

Federico Ramponi 2009-12-24 06:51:21

I've started reading about this and it looks very promising. I like how this can be adapted to take multiple input sources.

NickLarsen 2009-12-28 12:56:20

The code ran fine and plots. Nice job. But I'm not sure the acceleration plot is right - it imitates the velocity too closely, not its derivative. A flaw in the code, or is this a quirk of Kalman filtering?

DarenW 2010-02-12 15:48:59

Answer 4

+1 A:

I know nothing about rockets. I plotted your points and they look lovely.

Based on what I see in that plot let me assume that there is usually a single apogee and that the function that gave rise to your points has no derivative wrt time at that apogee.

Suggestion:

Monitor maximum altitude throughout the flight.
Continuously watch for the apogee by (say, simply) comparing the most recent few points with the current maximum.
Until you reach the maximum, with (0,0) fixed and some arbitrary set of knots calculate a collection of natural splines up to the current altitude. Use the residuals wrt the splines to decide which data to discard. Recalculate the splines.
At the maximum retain the most recently calculate splines. Start calculating a new set of splines for the curve beyond the apogee.

Bill Bell 2009-12-25 16:23:40

From my understanding of splines is that they are more useful for generating exact replicas of continuous data than they are interpolating missing or noisy data points. Am I missing something? To determine apogee, we take every data point for the last 20 samples and compare it to the last 20 samples before each. Once all samples show a decline in altitude, we say apogee is the highest recorded value during that interval.

NickLarsen 2009-12-28 12:54:36

You're right. Sloppy thinking. Apologies. Was trying to express: it appears to me that this is not the jagged set of data one often sees in a time series and you might omit the task of identifying a complicated error structure. From the beginning of a flight keep fitting easy smooth curves until you hit the apogee, discarding outliers--which appear obvious from the plot that I made. (Are they obvious to you though?) Then, after you hit the apogee start fitting a distinct set of easy smooth curves for the rest of the flight, again discarding outliers. In this instance, spline="brain f__t"

Bill Bell 2009-12-28 15:34:29

Answer 5

+1 A:

have you tried performing a scrolling window average of your values ? Basically you perform a window of, say 10 values (from 0 to 9), and calculate its average. then you scroll the window one point (from 1 to 10) and recalculate. This will smooth the values while keeping the number of points relatively unchanged. Larger windows give smoother data at the price of loosing more high-frequency information.

You can use the median instead of the average if your data happen to present outlier spikes.

You can also try with Autocorrelation.

Stefano Borini 2009-12-25 16:37:35

I actually did try this, however, I only tried it over a window size of 3. Perhaps some bigger window tests would make for better data, +1

NickLarsen 2009-12-28 12:50:04

The shape of the window matters. If you take a plain simple average of n points to the left and n to the right, along with the point at the center, you get some noise in the output because the points at the ends are entering/leaving the range as you move to calculate the next output point. Is better to calculate a weighted average - less weight given to points the further they are from the central point. See other answers for good ways to do this.

DarenW 2010-02-12 15:33:32

@DarenW : that's a very good idea. as you said, this way you reduce the "all-or-nothing" presence of a point...

Stefano Borini 2010-02-12 17:04:40

Answer 6

A:

ARIMA model and look for autocorrelation in the residual is standard procedure. Volatility model another.

LarsOn 2009-12-25 16:40:56

ansaurus

tags:

views:

answers:

Analyzing noisy data

related questions