views:

226

answers:

8

I have data that always looks something like this:

alt text

I need an algorithm to locate the three peaks.

The x-axis is actually a camera position and the y-axis is a measure of image focus/contrast at that position. There are features at three different distances that can be in focus and I need to determine the x-values for these three points.

The middle hump is always a little harder to pick out, even for a human.

I have a homemade algorithm that mostly works, but I'm wondering if there's a standard way to grab local maxima from a function that can have a little noise in it. The peaks overcome the noise easily though.

Also, being camera data, an algorithm that doesn't require scanning the full range could be useful.

Edit: Posting the Python code that I ended up using. It uses my original code that finds maxima given a search threshold and does a binary search to find a threshold that results in the desired number of maxima.

Edit: Sample data included in code below. New code is O(n) instead of O(n^2).

def find_n_maxima(data, count):
    low = 0
    high = max(data) - min(data)
    for iteration in xrange(100): # max iterations
        mid = low + (high - low) / 2.0
        maxima = find_maxima(data, mid)
        if len(maxima) == count:
            return maxima
        elif len(maxima) < count: # threshold too high
            high = mid
        else: # threshold too low
            low = mid
    return None # failed

def find_maxima(data, threshold):
    def search(data, threshold, index, forward):
        max_index = index
        max_value = data[index]
        if forward:
            path = xrange(index + 1, len(data))
        else:
            path = xrange(index - 1, -1, -1)
        for i in path:
            if data[i] > max_value:
                max_index = i
                max_value = data[i]
            elif max_value - data[i] > threshold:
                break
        return max_index, i
    # forward pass
    forward = set()
    index = 0
    while index < len(data) - 1:
        maximum, index = search(data, threshold, index, True)
        forward.add(maximum)
        index += 1
    # reverse pass
    reverse = set()
    index = len(data) - 1
    while index > 0:
        maximum, index = search(data, threshold, index, False)
        reverse.add(maximum)
        index -= 1
    return sorted(forward & reverse)

data = [
    1263.900, 1271.968, 1276.151, 1282.254, 1287.156, 1296.513,
    1298.799, 1304.725, 1309.996, 1314.484, 1321.759, 1323.988,
    1331.923, 1336.100, 1340.007, 1340.548, 1343.124, 1353.717,
    1359.175, 1364.638, 1364.548, 1357.525, 1362.012, 1367.190,
    1367.852, 1376.275, 1374.726, 1374.260, 1392.284, 1382.035,
    1399.418, 1401.785, 1400.353, 1418.418, 1420.401, 1423.711,
    1425.214, 1436.231, 1431.356, 1435.665, 1445.239, 1438.701,
    1441.988, 1448.930, 1455.066, 1455.047, 1456.652, 1456.771,
    1459.191, 1473.207, 1465.788, 1488.785, 1491.422, 1492.827,
    1498.112, 1498.855, 1505.426, 1514.587, 1512.174, 1525.244,
    1532.235, 1543.360, 1543.985, 1548.323, 1552.478, 1576.477,
    1589.333, 1610.769, 1623.852, 1634.618, 1662.585, 1704.127,
    1758.718, 1807.490, 1852.097, 1969.540, 2243.820, 2354.224,
    2881.420, 2818.216, 2552.177, 2355.270, 2033.465, 1965.328,
    1824.853, 1831.997, 1779.384, 1764.789, 1704.507, 1683.615,
    1652.712, 1646.422, 1620.593, 1620.235, 1613.024, 1607.675,
    1604.015, 1574.567, 1587.718, 1584.822, 1588.432, 1593.377,
    1590.533, 1601.445, 1667.327, 1739.034, 1915.442, 2128.835,
    2147.193, 1970.836, 1755.509, 1653.258, 1613.284, 1558.576,
    1552.720, 1541.606, 1516.091, 1503.747, 1488.797, 1492.021,
    1466.720, 1457.120, 1462.485, 1451.347, 1453.224, 1440.477,
    1438.634, 1444.571, 1428.962, 1431.486, 1421.721, 1421.367,
    1403.461, 1415.482, 1405.318, 1399.041, 1399.306, 1390.486,
    1396.746, 1386.178, 1376.941, 1369.880, 1359.294, 1358.123,
    1353.398, 1345.121, 1338.808, 1330.982, 1324.264, 1322.147,
    1321.098, 1313.729, 1310.168, 1304.218, 1293.445, 1285.296,
    1281.882, 1280.444, 1274.795, 1271.765, 1266.857, 1260.161,
    1254.380, 1247.886, 1250.585, 1246.901, 1245.061, 1238.658,
    1235.497, 1231.393, 1226.241, 1223.136, 1218.232, 1219.658,
    1222.149, 1216.385, 1214.313, 1211.167, 1208.203, 1206.178,
    1206.139, 1202.020, 1205.854, 1206.720, 1204.005, 1205.308,
    1199.405, 1198.023, 1196.419, 1194.532, 1194.543, 1193.482,
    1197.279, 1196.998, 1194.489, 1189.537, 1188.338, 1184.860,
    1184.633, 1184.930, 1182.631, 1187.617, 1179.873, 1171.960,
    1170.831, 1167.442, 1177.138, 1166.485, 1164.465, 1161.374,
    1167.185, 1174.334, 1186.339, 1202.136, 1234.999, 1283.328,
    1347.111, 1679.050, 1927.083, 1860.902, 1602.791, 1350.454,
    1274.236, 1207.727, 1169.078, 1138.025, 1117.319, 1109.169,
    1080.018, 1073.837, 1059.876, 1050.209, 1050.859, 1035.003,
    1029.214, 1024.602, 1017.932, 1006.911, 1010.722, 1005.582,
    1000.332, 998.0721, 992.7311, 992.6507, 981.0430, 969.9936,
    972.8696, 967.9463, 970.1519, 957.1309, 959.6917, 958.0536,
    954.6357, 954.9951, 947.8299, 953.3991, 949.2725, 948.9012,
    939.8549, 940.1641, 942.9881, 938.4526, 937.9550, 929.6279,
    935.5402, 921.5773, 933.6365, 918.7065, 922.5849, 939.6088,
    911.3251, 923.7205, 924.8227, 911.3192, 936.7066, 915.2046,
    919.0274, 915.0533, 910.9783, 913.6773, 916.6287, 907.9267,
    908.0421, 908.7398, 911.8401, 914.5696, 912.0115, 919.4418,
    917.0436, 920.5495, 917.6138, 907.5037, 908.5145, 919.5846,
    917.6047, 926.8447, 910.6347, 912.8305, 907.7085, 911.6889,
]

for n in xrange(1, 6):
    print 'Looking for %d maxima:' % n
    indexes = find_n_maxima(data, n)
    print indexes
    print ', '.join(str(data[i]) for i in indexes)
    print

Output:

Looking for 1 maxima:
[78]
2881.42

Looking for 2 maxima:
[78, 218]
2881.42, 1927.083

Looking for 3 maxima:
[78, 108, 218]
2881.42, 2147.193, 1927.083

Looking for 4 maxima:
[78, 108, 218, 274]
2881.42, 2147.193, 1927.083, 936.7066

Looking for 5 maxima:
[78, 108, 218, 269, 274]
2881.42, 2147.193, 1927.083, 939.6088, 936.7066
+6  A: 

The local maxima would be any x point which has a higher y value than either of its left and right neighbors. To eliminate noise, you could put in some kind of tolerance threshold (ex. x point must have higher y value than n of its neighbors).

To avoid scanning every point, you could use the same approach but go 5 or 10 points at a time to get a rough sense of where the maximum lie. Then come back to those areas for a more detailed scan.

btreat
This is sort of how my current algorithm works. The problem is that I have to know what threshold to give it, and that threshold could change depending on the actual data.
FogleBird
@FogleBird You could find that threshold by going through the data once and computing the mean/std dev of the differences between adjacent y-values; and let your threshold be some function of that.
quantumSoup
@Aircule: Good idea, I'll try that tomorrow.
FogleBird
'The local maxima would be any x point which has a higher y value than either of its left and right neighbors'Not necessarily this simple, right? The presence of equal values will break this. If your data points are 1, 2, 3, 50, 3, 3, 10, 10, 3, then the 10s represent a local maxima, though neither of them is higher than both left + right neighbours.
Cowan
Closest to what I ended up doing, so accepting this for now.
FogleBird
To handle maxima that might occur in flat spots, re-word "any x point which has a higher y value than its neighbors" to "any x point which *does not* have a *lower* y value than its neighbors".
bta
+2  A: 

You could try to fit a spline to the data and then find the extrema of the spline. Since the spline is piecewise polynomial, the exact locations of the extrema can be found with relatively simple formulas.

Victor Liu
I think fitting the data in a spline would be more computationally intensive than going through the data "manually."
quantumSoup
Fitting splines is very fast -- O(n).
John D. Cook
I also think fitting would be the most accurate solution; though I would not use spline. In this case I'd try a sum of 3 Gauss-bells.
Curd
I do ultimately use curve fitting to find the actual peaks. But I do that with a higher resolution scan around each maxima, so then I just do a simple parabolic fit. I use the algorithm being discussed here just to figure out roughly where the peaks are during a more coarse grained (faster) focus scan.
FogleBird
+6  A: 

Couldn't you move along the graph, regularly calculating the derivative and if it switches from positive to negative you've found a peak?

Graphain
The derivative will fluctuate +/- too much due to the random noise. How do I eliminate that problem?
FogleBird
Have you tried a low pass filter?
K. Brafford
Basically averaging with filters as suggested I would guess.
Graphain
If that happens, the noise is creating local maxima. Averaging or filtering away the noise is one way to decide which maxima are 'important' enough.However, there's no need to calculate a derivative here. Just compare the value of the point to its neighbors.
Larry Wang
+1  A: 

Do you know the derivative of the data? If yes and you can symbolically solve the system then you can find the points where the derivative is equal to zero.

If you don't have the formula (which seems to be the case from your OP) then you might want to try filter out some noise then do the following:

If you can't solve symbolically then you can use something like Newton–Raphson method to get to the local maxima and choose the starting points randomly from the range to try to capture all the maxima.

If you don't have the derivative data then you might want to try a hill climbing algorithm that doesn't require the derivative and start it at multiple different randomly chosen points. You could then keep track of the points that you find when the iterative hill climbing part of the algorithm terminates. This will only probabilistically find all the local maxima but it may be good enough for your purposes.

EDIT: given that the 3 peaks will be roughly in the same places you should try guarantee that the starting point for these algorithms is near those points for at least some of the times you run the iterative algorithm.

shuttle87
+1  A: 

You could try a band-pass filter to reject the noise and make it easier to reliably select those maxima.

The point of a band-pass (rather than low-pass) is to pull nearly-constant down to zero. That way, the highest values you find in the filtered result will probably be the clearest peaks.

Certainly if you can define a narrow frequency-range for your signal and apply a very selective filter, it should make a fairly unsophisticated maxima-finding algorithm practical - e.g. a simple sample-thats-higher-than-either-neighbour scan.

A sophisticated filter might not be necessary - you could get away with a mexican hat wavelet at a single well-chosen scale. One scale probably means it's not really a wavelet any more - just a band-pass FIR filter.

EDIT

There's an asymmetric wavelet (I forget the name) which, if the mexican hat is analogous to a cosine, takes the role of the sine. I mention it as it combines band-pass filtering with a kind of derivative - the zero-crossings in the result are the stationary-points of a band-pass filtered version of the original signal.

A "debounced" scan could then identify reliable maxima by looking for crossing points in this "derivative" signal.

Steve314
A: 

I practice, I've found what works well is to use a dilation morphology operation to produce a dilated version of your sampled function (data points) then to identify local max's compare the dilated version vs. the original and anywhere where the dilated version equals the original version should be a local maximum. This works really well I find with 2D+ data (i.e. images) but since you have 1D data it may be easier just to use the differences between successive points as an approximation to the derivative.

Note if you do use the dilation technique, the structuring element (both the size and shape) that you use in the dilation greatly determines the types of peaks you are looking for.

Also if you have noise in your data, smooth it with a low pass filter, like a 1D Gaussian before searching.

More info on dilation can be found here:

here is the idea implemented in matlab: http://www.mathworks.com/matlabcentral/fileexchange/14498-local-maxima-minima

if you don't know what dilation is: http://en.wikipedia.org/wiki/Dilation_%28morphology%29

(its dead simple once you understand it here is a really easy explanation) http://homepages.inf.ed.ac.uk/rbf/HIPR2/dilate.htm

ldog
+2  A: 

direct approach, something like this:

#include <stdio.h>
#include <stdlib.h>

#define MAXN 100

double smooth(double arr[], int n, int i)
{
        double l,r,smoo;
        l = (i - 1 < 0)?arr[0]:arr[i-1];
        r = (i + 1 >= n)?arr[n-1]:arr[i+1];
        smoo = (l + 2*arr[i] + r)/4;
        return smoo;
}

void findmax(double arr[], int n)
{
        double prev = arr[0];
        int i;
        int goup = 1;

        for(i = 0; i < n; i++)
        {
                double cur = smooth(arr,n,i);
                if(goup) {
                        if(prev > cur && i > 0) {
                                printf("max at %d = %lf\n", i-1, arr[i-1]);
                                goup = 0;
                        }
                } else {
                        if(prev < cur)
                                goup = 1;
                }
                prev = cur;
        }
}

int main()
{
        double arr[MAXN] = {0,0,1,2,3,4,4,3,2,2,3,4,6,8,7,8,6,3,1,0};
        int n = 20, i;

        for(i = 0; i < n; i++)
                printf("%.1lf ",arr[i]);
        printf("\n");

        findmax(arr,n);
        return 0;
}

output:

0.0 0.0 1.0 2.0 3.0 4.0 4.0 3.0 2.0 2.0 3.0 4.0 6.0 8.0 7.0 8.0 6.0 3.0 1.0 0.0
max at 6 = 4.000000
max at 14 = 7.000000

1) set state = goup: going upward the curve;
2) if previuos value is greater than current then there was maximum:
print it and set state to godown
3) while in godown state wait until previous is less then current and switch to (1)

to reduce noise use some smoothing function smooth()

oraz
A: 

As others have mentioned derivatives or comparing to local neighbors usually works for me. If you're worried about noise I can recommend median filtration as a pretty fast and reliable filtration scheme. I use it and reverse median filtration all the time to squelch noise in acoustic sensors, works great.

ChrisC