views:

154

answers:

3

I'm displaying a small Google map on a web page using the Google Maps Static API.

I have a set of 15 co-ordinates, which I'd like to represent as points on the map.

Due to the map being fairly small (184 x 90 pixels) and the upper limit of 2000 characters on a Google Maps URL, I can't represent every point on the map.

So instead I'd like to generate a small list of co-ordinates that represents an average of the big list.

So instead of having 15 sets, I'd end up with 5 sets, who's positions approximate the positions of the 15. Say there are 3 points that are in closer proximity to each-other than to any other point on the map, those points will be collapsed into 1 point.

So I guess I'm looking for an algorithm that can do this.

Not asking anyone to spell out every step, but perhaps point me in the direction of a mathematical principle or general-purpose function for this kind of thing?

I'm sure a similar function is used in, say, graphics software, when pixellating an image.

(If I solve this I'll be sure to post my results.)

+2  A: 

I recommend K-means clustering when you need to cluster N objects into a known number K < N of clusters, which seems to be your case. Note that one cluster may end up with a single outlier point and another with say 5 points very close to each other: that's OK, it will look closer to your original set than if you forced exactly 3 points into every cluster!-)

Alex Martelli
Thanks a bunch Alex! I guess clustering was the word I was hunting for. I'll give this K-means clustering a shot.
jonathanconway
You're welcome -- I'd add more specific suggestions if I knew your intended implementation language, precise needs, etc.
Alex Martelli
If you only have 15 points, this is totally fine - but if you're going to have thousands of points, K-means might be a bit slow.
Chris B
I've used it to cluster many thousands of points into a few hundreds clusters and it seemed fine (if the exact number of clusters WAS well known in advance). About K-Means, http://en.wikipedia.org/wiki/Cluster_analysis says "main advantages are simplicity and speed allowing it to run on large datasets", and that observation matches my experience. _Unsupervised_ clustering (find out the right # of clusters too) is a different problem, of course; nice survey at http://webmining.spd.louisville.edu/Websites/tutorials/UnsupervisedClustering/UnsupervisedClustering.html .
Alex Martelli
A: 

In general I think the area you need to search around in is "Vector Quantization". I've got an old book title Vector Quantization and Signal Compression by Allen Gersho and Robert M. Gray which provides a bunch of examples.

From memory, the Lloyd Iteration was a good algorithm for this sort of thing. It can take the input set and reduce it to a fixed sized set of points. Basically, uniformly or randomly distribute your points around the space. Map each of your inputs to the nearest quantized point. Then compute the error (e.g. sum of distances or Root-Mean-Squared). Then, for each output point, set it to the center of the set that maps to it. This will move the point and possibly even change the set that maps to it. Perform this iteratively until no changes are detected from one iteration to the next.

Hope this helps.

Adrian
Yep, fair summary of K-means (except you don't really need to measure the error if you're going to consider it stabilized when and only when "no changes are detected from one iteration to the next";-).
Alex Martelli
Adrian
+1  A: 

If you are searching for such functions/classes, have a look at MarkerClusterer and MarkerManager utility classes. MarkerClusterer closely matches the described functionality, as seen in this demo.

Salman A