views:

63

answers:

4

How can one generate say 1000 random points with a distribution like that of towns and cities in e.g. Ohio ?
I'm afraid I can't define "distributed like cities" precisely; uniformly distributed centres + small Gaussian clouds are easy but ad hoc.
Added: There must be a family of 2d distributions with a clustering parameter that can be varied to match a given set of points ?

+1  A: 
aioobe
+2  A: 

Maybe you can take a look at Walter Christaller's Theory of Central Places. I guess there must be some generator somewhere, or you can cook up your own.

ShiDoiSi
Interesting, thanks. This leads to "spatial statistics" and "spatial point pattern", which have many papers and several booksbut so far no entry point / no code.
Denis
+2  A: 

Start with a model of the water features in your target area (or make one up, if it's for an imaginary place), then cluster the cities near river junctions, along lakeshores, lake-river junctions. Then make imaginary highways connecting those major cities. Now sprinkle some intermediate cities along those highways at reasonable spacing, preferring to be near junctions in the highways. Now sprinkle some small towns through the empty spaces.

Doug McClean
A: 

Gaussian clusters with Poisson cluster sizes work fairly well.

Problem: generate random points that cluster roughly like given cities, say in the USA.

Subproblems:
a) describe clusters with rows of numbers, so that "cluster A is like cluster B" simplifies to "clusternumbers(A) is like "clusternumbers(B)".
Running N=100 then 1000 points through fcluster below, with ncluster=25, gives

N 100 ncluster 25: 22 + 3  r 117
sizes: av 4     10   9   8   7   6   6   5   5   4   4   4 ...
radii: av 117  202 198 140 134  64  62  28 197 144 148 132 ...

N 1000 cluster 25: 22 + 3  r 197
sizes: av 45  144 139 130  85  84  69  63  43  38  33  30  ...
radii: av 197  213 279 118 146 282 154 245 212 243 226 235 ...

b) find a combiation of random generators with 2 or 3 parameters which can be varied to generate different clusterings.
Gaussian clusters with Poisson cluster sizes can match clustering of cities fairly well:

def randomclusters( N, ncluster=25,  radius=1, box=box ):
    """ -> N 2d points: Gaussian clusters, Poisson cluster sizes """
    pts = []
    lam = eval( str( N // ncluster ))
    clustersize = lambda: np.random.poisson(lam - 1) + 1
        # poisson 2:  14  27  27  18   9   4  %
        # poisson 3:   5  15  22  22  17  10  %
    while len(pts) < N:
        u = uniformrandom2(box)
        csize = clustersize()
        if csize == 1:
            pts.append( u )
        else:
            pts.extend( inbox( gauss2( u, radius, csize )))
    return pts[:N]


    # Utility functions --

import scipy.cluster.hierarchy as hier

def fcluster( pts, ncluster, method="average", criterion="maxclust" ):
    """ -> (pts, Y pdist, Z linkage, T fcluster, clusterlists)
        ncluster = n1 + n2 + ... (including n1 singletons)
        av cluster size = len(pts) / ncluster
    """
        # Clustering is pretty fast:
        # sort pdist, then like Kruskal's MST, O( N^2 ln N )
        # Many metrics and parameters are possible; these satisfice.
    pts = np.asarray(pts)
    Y = scipy.spatial.distance.pdist( pts )  # N*(N-1)/2
    Z = hier.linkage( Y, method )  # N-1, like mst
    T = hier.fcluster( Z, ncluster, criterion=criterion )
    clusters = clusterlists(T)
    return (pts, Y, Z, T, clusters)

def clusterlists(T):
    """ T = hier.fcluster( Z, t ) e.g. [a b a b c a]
        -> [ [0 2 5] [1 3] ] sorted by len, no singletons [4]
    """
    clists = [ [] for j in range( max(T) + 1 )]
    for j, c in enumerate(T):
        clists[c].append( j )
    clists.sort( key=len, reverse=True )
    n1 = np.searchsorted(  map( len, clists )[::-1], 2 )
    return clists[:-n1]

def radius( x ):
    """ rms |x - xmid| """
    return np.sqrt( np.mean( np.var( x, axis=0 )))
        # * 100  # 1 degree lat/long ~ 70 .. 111 km
Denis