I currently have several scripts that need to group a set of points into "levels" by height. The assumption is that the z-values of the points will cluster loosely around certain values corresponding to the levels, with large-ish gaps in between the clusters.
So I have the following function:
def level_boundaries(zvalues, threshold=10.0):
'''Finds all elements z of zvalues such that no other element
w of zvalues satisfies z <= w < z+threshold.'''
zvals = zvalues[:]
zvals.sort()
return [zvals[i] for i, (a, b) in enumerate(pairs(zvals)) if b-a >= threshold]
"pairs" is taken straight from the itertools module documentation, but for reference:
def pairs(iterable):
'iterable -> (iterable[n], iterable[n+1]) for n=0, 1, 2, ...'
from itertools import izip, tee
first, second = tee(iterable)
second.next()
return izip(first, second)
A contrived usage example (my actual data sets are quite a bit too large to use as examples):
>>> import random
>>> z_vals = [100 + random.uniform(-1.5,1.5) for n in range(10)]
>>> z_vals += [120 + random.uniform(-1.5,1.5) for n in range(10)]
>>> z_vals += [140 + random.uniform(-1.5,1.5) for n in range(10)]
>>> random.shuffle(z_vals)
>>> z_vals
[141.33225473458657, 121.1713952666894, 119.40476193163271, 121.09926601186737, 119.63057973814858, 100.09095882968982, 99.226542624083109, 98.845285642062763, 120.90864911044898, 118.65196386994897, 98.902094334035326, 121.2741094217216, 101.18463497862281, 138.93502941970601, 120.71184773326806, 139.15404600347946, 139.56377827641663, 119.28279815624718, 99.338144106822554, 139.05438770927282, 138.95405784704622, 119.54614935118973, 139.9354467277665, 139.47260445000273, 100.02478729763811, 101.34605205591622, 138.97315450408186, 99.186025111246295, 140.53885845445572, 99.893009827114568]
>>> level_boundaries(z_vals)
[101.34605205591622, 121.2741094217216]