views:

1107

answers:

6

After experimenting with client side approach to clustering large numbers of Google markers I decided that it won't be possible for my project (social network with 28,000+ users).

Are there any examples of clustering the coordinates on the server side - preferably in Python/Django?

The way I would like this to work is to gradually index the markers based on their proximity (radius) and zoom level.

In another words when a new user registers he/she is automatically assigned to a certain 'group' of markers that are close to each other thus increasing the 'group's' counter. What's being send to the server is just a small number of 'groups'. Only when the zoom level/scale of map is 1:1 - actual users are shown on the map.

That way the client side will have to deal only with 10-50 markers per request/zoom level.

A: 

One way to do it would be to define a grid with a unit size based on the zoom level. So you collect up all the items within a grid by lat,lon to one decimal place. An example is 42.2x73.4. So a point at 42.2003x73.4021 falls in that grid cell. That cell is bounded by 42.2x73.3 and 42.2x73.5.

If there are one or more points in a grid cell, you place a marker in the center of that grid.

You then hook up the zoomend event and change your grid size accordingly, and redraw the markers.

http://code.google.com/apis/maps/documentation/reference.html#GMap2.zoomend

dar
Wouldn't this be slow when the entire map needs to be visible and you have 30k markers to load?
Franek
+1  A: 

This is a paid service that uses server-side clustering, but I'm not sure how it works. I'm guessing that they just use your data to generate the markers to be shown at each zoom level.

Update: This tutorial demonstrates a basic server-side clustering function. It's written in PHP for the Static Maps API, but you could use it as a starting point.

Chris B
I know about their service but I want to have full control of the code.
Franek
the tutorial is useful but the algorithm might be too slow for 28,000 points. clustering based on a grid is clumsy but a lot faster obviously.
Tom
@Tom - an algorithm like this would definitely be too slow on the client-side, but I think it might be possible server-side.
Chris B
A: 

You could simply drop decimals based on the zoom level. Would that work for you?

Our geo indexes are based on morton numbers: http://www.rooftopsolutions.nl/article/231 (shameless self promotion).

If you want more precision than the 10-base system, the morton number will allow you to increase the zoom level on a 2-base number system, simply by doing something like:

GROUP BY (morton XOR (-precision))

The higher precision gets, the more items will be clustered.

Evert
A: 

I am using Django and Python to cluster real estate and rental listings, and the source can be found here.

Hope it helps!

Tristen
+2  A: 

I wrote a blog post about my approach using Python and Django here:

http://www.quanative.com/2010/01/01/server-side-marker-clustering-for-google-maps-with-python/

Larry Kubin
A: 

You might want to take a look at the DBSCAN and OPTICS pages on wikipedia, these looks very suitable for clustering places on a map. There is also a page about Cluster Analysis that shows all the possible algorithms you can use, most would be trivial to implement using the language of your choice.

With 28k+ points, you might want to skip django and just jump into C/C++ directly, and surely not expect this to get calculated in real-time in response to web requests.

Evgeny