I've not done any work with Google maps specifically but many moons ago, I was involved in a project which managed a mobile workforce for a large Telco.
They had similar functionality in that they had maps which they could zoom in on for their allocated jobs (local to the machine rather than over the network) and we solved a similar problem which sounds very similar like yours. Points of interest on the maps were called landmarks and were indicated by small markers on the map called landmark pointers, which the worker could select to get a textual description..
At the minimum zoom, there would have been a plethora of landmark pointers, making the map useless. We made a command decision to limit the landmark pointers to a smaller number (400). In order to do that, the map was divided into a 20x20 matrix no matter what the zoom level, which gave us 400 matrix elements.
Then, if a landmark shared the same matrix element as another, the application combined them and generated a single landmark pointer with the descriptive text containing the text of all the landmarks in that matrix element.
That way there were never more than 400 landmark pointers. As the minion zoomed in, the landmark pointers were regenerated and landmarks could end up in different matrix elements - in that case, they were no longer combined with other landmarks.
Similarly, zooming out sometimes merged two or more landmarks into a single landmark pointer.
That sounds like what you're trying to achieve with "clustering or zoom level bunching" although, as I said, I have little experience with Google Maps itself so I'm not sure this is possible. But given Google's reputation, I suspect it is.