views:

201

answers:

2

Hi stackoverflow friends, I'm trying to write a program to automate one of my more boring and repetitive work tasks. I have some programming experience but none with processing or interpreting large volumes of data so I am seeking your advice (both suggestions of techniques to try and also things to read to learn more about doing this stuff).

I have a piece of equipment that monitors an experiment by taking repeated samples and displays the readings on its screen as a graph. The input of experiment can be altered and one of these changes should produce a change in a section of the graph which I currently identify by eye and is what I'm looking for in the experiment. I want to automate it so that a computer looks at a set of results and spots the experiment input that causes the change.

I can already extract the results from the machine. Currently they results for a run are in the form of an integer array with the index being the sample number and the corresponding value being the measurement.

The overall shape of the graph will be similar for each experiment run. The change I'm looking for will be roughly the same and will occur in approximately the same place every time for the correct experiment input. Unfortunately there are a few gotchas that make this problem more difficult.

  1. There is some noise in the measuring process which mean there is some random variation in the measured values between different runs. Although the overall shape of the graph remains the same.

  2. The time the experiment takes varies slightly each run causing two effects. First, the a whole graph may be shifted slightly on the x axis relative to another run's graph. Second, individual features may appear slightly wider or narrower in different runs.

In both these cases the variation isn't particularly large and you can assume that the only non random variation is caused by the correct input being found.

Thank you for your time,

Pinky

+2  A: 

I think you're looking for information on Digital Signal Processing. It can range from very simple to very hard to understand. If, say, your pre-event signal was 0, and every signal after the relevant signal was 1, you could just look for the first 1, figure out the time at which it occurred, and you'd be done. That's basically the limiting case of simplicity, and it might be a good place to start. Implement that, and you've got the beginnings of a sense of how to answer your question. Now, then, you've got noise. So, say, pre-event might range from -10 to 10, and post-event might range from 90 to 110. Still simple; watch for the first value greater than 10. But of course it's never that simple. You might have to average a window of readings, might look for some threshold of change from previous measurement, etc. In advanced cases, you could find yourself using transformations into other spaces, applying filters, pattern matching, and the like. But from your description, it sounds like reasonably simple methods should do the job for you. Don't get intimidated by concepts like FFT - you probably don't need them, yet. For now, at least, assume that it can be solved simply. Start with a trivially simple (but insufficient) solution, and work your way towards the solution that works.

Carl Manaster
Hi Carl, thanks for the advice, Digital signal processing is definitely the field and I've amended the tags to include it. I'm currently working through an online book on dsp but as yet haven't come across any techniques that sound relevant to this problem but i assume its just a matter of time.
+1  A: 

One technique worth looking at if the sort of filter-and-threshold approach Carl suggests won't suffice is Cross Correlation. The essence of this is pretty simple: if two data sets are reasonably similar, their dot product will be maximimised when they align (because the highest values will be multiplied together). So you can get a good estimate of how to line them up by calculating this product at each offset and choosing the one that gives the highest result.

In a case like yours, the idea would be to have an "ideal" version of the curve shape you're looking for -- either generated from theory/simulation or by averaging the results of a number of good experimental curves identified and aligned by eye -- and compare it against the experimental data.

For simplicity, let's assume that the data set is longer than the ideal and has enough empty space at either end that we can ignore any boundary issues. Since you are looking for one specific event, it should be trivial to cut down your ideal to comply with this assumption. Crudely coded in Java, then, the process might go something like this:

int offset ( double[] data, double[] ideal )
{
    double cMax = -Double.MAX_VALUE;
    int tMax = 0;

    for ( int t = 0; t < data.length - ideal.length; ++t )
    {
        double c = 0;
        for ( int i = 0; i < ideal.length; ++i )
        {
            c += data[t + i] * ideal[i];
        }

        if ( c > cMax )
        {
            cMax = c;
            tMax = t;
        }
    }

    return tMax;
 }

Obviously, there are plenty of situations in which this approach can fail, particularly if there is a significant amount of non-independent noise or if there are periodicities in the signal that give rise to aliasing. Also, this example throws away a lot of information to focus just on an absolute maximum, which may be error-prone if there isn't a large, narrow peak in the cross correlation. But from your description it seems like your problem could be fairly amenable to something along these lines.

walkytalky