This is a problem that is hard to solve generically; your final solution will end up being very process-dependent, and unique to your situation.
That being said, you need to start by understanding your data: from one sample to the next, what kind of variation is possible? Using that, you can use previous data samples (and maybe future data samples) to decide if the current sample is bogus or not. Then, you'll end up with a filter that looks something like:
const int MaxQueueLength = 100; // adjust these two values as necessary
const double MaxProjectionError = 5;
List<double> FilterData(List<double> rawData)
{
List<double> toRet = new List<double>(rawData.Count);
Queue<double> history = new Queue<double>(MaxQueueLength); // adjust queue length as necessary
foreach (double raw_Sample in rawData)
{
while (history.Count > MaxQueueLength)
history.Dequeue();
double ProjectedSample = GuessNext(history, raw_Sample);
double CurrentSample = (Math.Abs(ProjectedSample - raw_Sample) > MaxProjectionError) ? ProjectedSample : raw_Sample;
toRet.Add(CurrentSample);
history.Enqueue(CurrentSample);
}
return toRet;
}
The magic, then, is coming up with your GuessNext function. Here, you'll be getting into stuff that is specific to your situation, and should take into account everything you know about the process that is gathering data. Are there physical limits to how quickly the input can change? Does your data have known bad values you can easily filter?
Here is a simple example for a GuessNext function that works off of the first derivative of your data (i.e. it assumes that your data is a roughly a straight line when you only look at a small section of it)
double lastSample = double.NaN;
double GuessNext(Queue<double> history, double nextSample)
{
lastSample = double.IsNaN(lastSample) ? nextSample : lastSample;
//ignore the history for simple first derivative. Assume that input will always approximate a straight line
double toRet = (nextSample + (nextSample - lastSample));
lastSample = nextSample;
return toRet;
}
If your data is particularly noisy, you may want to apply a smoothing filter to it before you pass it to GuessNext. You'll just have to spend some time with the algorithm to come up with something that makes sense for your data.
Your example data appears to be parametric in that each sample defines both a X and a Y value. You might be able to apply the above logic to each dimension independently, which would be appropriate if only one dimension is the one giving you bad numbers. This can be particularly successful in cases where one dimension is a timestamp, for instance, and the timestamp is occasionally bogus.