With 500k records this sounds like a job for core data. Preferably core data on a desktop. If the data isn't being updated in realtime you should process the data on heavier hardware and just use the iPhone to display it. That would massively simplify the app because you would just to store the value for each map cell.
Even if you did want to process it on the iPhone, you should have the app process the data once and save the results. There appears to be no reason to have the app recalculate the species value of every map cell every time it wants t display a cell.
I would suggest creating a entity in core data to represent observations. Then another entity to represent geographical squares. Set a relationship between the squares and the observations that fall within the square. Then create a calculated value of species in the square entity. You would then only have to recalculate the species value if one of the observations changed.
This is the kind of problem that object graphs were created for. Even if the data is being continuously updated. Core data would only perform those calculations needed to accommodate the small number of observation objects that changed at any given time and it would do so in a highly optimized manner.
Edit01:
Approaching the problem from a completely different angle. Within core data.
(1) Create an object graph of observation records such that each each observation object has a reciprocal relation to the other observation objects that are closest to it geographically. This would create an object graph that would look like a flat irregular net.
(2) Create methods for the observationRecords class that (a) determine if the record lays within the bounds of an arbitrary geographic square (b) ask if each of its releated record if they are also in the square (c) return its own species count and the count of all the related records.
(3) Divide your map into the some reasonable small squares e.g. one second of arc square. Within that square select one linked record and add it to a list. Choose some percentage of all records like 1 in every 100 or 1,000 so that you cut the list down from 500k to to create a sublist that can be quickly searched by brute force predicate. Let's call those records in the list the gridflags.
(4) When the user zooms in, use brute force to find all the gridflag records with the geographical grid. Then ask each gridflag record to send messages to each of its linked records to see if (a) they're inside the grid, (b) what their species count is and (c) what the count is for their linked records that are also within the grid. (Use a flag to make sure each record is only queried once per search to prevent runaway recursion.)
This way, you only have to find one record inside each arbitrarily sized grid cell and that record will find all the other records for you. Instead of stepping through each record to see which record goes in what cell every time, you just have to process the records in each cell and those immediately adjacent. As you zoom in, the number of records you actually query shrinks instead of remaining constant. If a grid cell has only a handful of records, you only have to query a handful of records.
This would take some effort and time to set up but once you did it would be rather efficient especially when zoomed way in. For the top level, just have a preprocessed static map.
Hope I explained that well enough. It's hard to convey verbally.