views:

131

answers:

3

Hi people. I have problem. I have database with 500k records. Each record store latitude, longitude, specie of animal,date of observation. I must draw grid(15x10) above mapkit view, that show the concentration of specie in this grid cell. Each cell is 32x32 box.

If I calculate in run-time it is very slow. Have somebody idea how to cache it?In memory or in database.

Data structure:

Observation:

  • Latitude
  • Longitude
  • Date
  • Specie
  • some other unimportant data

Screen sample:

alt text

Each red box opocasity show count of species in this region.

Code that i use now: data -> select from database, it is all observation in map region

for (int row = 0; row < rows; row++)
 { 
  for (int column = 0; column < columns; column++)
  {
   speciesPerBox=0;
   minG=boxes[row][column].longitude;
   if (column!=columns-1) {
    maxG=boxes[row][column+1].longitude;
   } else {
    maxG=buttomRight.longitude;
   }

   maxL=boxes[row][column].latitude;
   if (row!=rows-1) {
    minL=boxes[row+1][column].latitude;
   } else {
    minL=buttomRight.latitude;
   }

   for (int i=0; i<sightingCount; i++) {
    l=data[i].latitude;
    g=data[i].longitude;

    if (l>=minL&&l<maxL&&g>=minG&&g<maxG) {
     for (int j=0; j<speciesPerBox; j++) {
       if (speciesCountArray[j]==data[i].specie) {
        hasSpecie=YES;
       }
      }

      if (hasSpecie==NO) {
       speciesCountArray[speciesPerBox]=data[i].specie;
       speciesPerBox++;
      }

      hasSpecie=NO;


     }
    }
   }


   mapData[row][column].count = speciesPerBox;
  }
 }
+4  A: 

Since you data is static, you can pre-compute each species for each grid and store it in the database instead of all the location coordinates.

Since you have 15 x 10 = 150 cells, you'll end up with 150 * [num of species] records in the database, which should be a much smaller number.

Also, make sure you have indexes on the proper columns. Otherwise, your queries will have to scan every single record over and over again.

Ben S
If user scale map i need to recalculate all this data, and user can move map by touches and i need to recalculate it also.
Sergey Zenchenko
How many zoom levels do you have? Pre-compute the highest levels of zoom (since they're the most expensive to compute). If you allow the user to zoom arbitrarily, then you can't really reduce the computations required. Your problem exceeds the capacity of the iPhone.
Ben S
Yes I think about it.
Sergey Zenchenko
Definitely worth precalculating the top level (as shown above) and falling back to real time for the zoomed in mode.
Epsilon Prime
+1  A: 

The loop for (int i=0; i<sightingCount; i++) is killing your performance. Especially the large number of if (l>=minL&&l<maxL&&g>=minG&&g<maxG) statements, where MOST OF the sightings will be skipped.

How large is sightingCount?

First you should use a kind of spatial optimization, e.g. a simple one: store species count lists per cell (lets call them "zones"). Define those zones rather large, so that you do not waste space. But smaller zones provide better performance, and too small zones will reverse the effect. So, make it configurable and test different zone sizes to find a good compromise!

When its time to sum up number of species in a cell for rendering, determine which zones the given cell overlaps (rather simple and fast "rectangle overlap" test). Then you only have to check the species counts of those zones. This largely reduces the iterations of your inner loop!

Thats the idea (of most "spatial optimizations"): divide and conquer; here you will divide your space, and then you can early reject the processing of a large number of irrelevant "sightings" with minimal effort (the added effort is the rectangle overlap test, but each test rejects multiple sightings, your current code tests each single sighting for relevance).

In a second step, also apply some obvious code optimizations: e.g. minL and maxL do not change per column. Computing minL and maxL can be moved to the outer loop (just before for( int column=0; ...).

As the latitudes of the grids are evenly distributed, you can even remove them from your grid cells, which saves some time in your iteration. Here an example (the spatial optimizations not included):

maxL=boxes[0][0].latitude;
minL=boxes[rows-1][0].latitude;
incL=maxL-minL;
for( int row = 0; row < rows; row++ )
{
    for( int column = 0; column < columns; column++ )
    {
        speciesPerBox=0;
        minG=boxes[row][column].longitude;
        if (column!=columns-1) {
            maxG=boxes[row][column+1].longitude;
        } else {
            maxG=buttomRight.longitude;
        }
        ...
        ...
    }
    ...

    minL = maxL; // left edge = right edge of previous step
    maxL += incL; // increment right edge
    if( maxL >= 90 ) maxL -= 90; // check your scale, i assume 90°
}

Maybe this also works for the longitude loop, but longitude may not be evenly distributed (i.e. "incG" is different in each step).

Note that the spatial optimization will make a huge difference, the loop optimizations only a small (but still worth) difference.

frunsi
sightingCount - in max case is 500-600k
Sergey Zenchenko
Alright. Then you should use the spatial optimization.
frunsi
+1  A: 

With 500k records this sounds like a job for core data. Preferably core data on a desktop. If the data isn't being updated in realtime you should process the data on heavier hardware and just use the iPhone to display it. That would massively simplify the app because you would just to store the value for each map cell.

Even if you did want to process it on the iPhone, you should have the app process the data once and save the results. There appears to be no reason to have the app recalculate the species value of every map cell every time it wants t display a cell.

I would suggest creating a entity in core data to represent observations. Then another entity to represent geographical squares. Set a relationship between the squares and the observations that fall within the square. Then create a calculated value of species in the square entity. You would then only have to recalculate the species value if one of the observations changed.

This is the kind of problem that object graphs were created for. Even if the data is being continuously updated. Core data would only perform those calculations needed to accommodate the small number of observation objects that changed at any given time and it would do so in a highly optimized manner.

Edit01:

Approaching the problem from a completely different angle. Within core data.

(1) Create an object graph of observation records such that each each observation object has a reciprocal relation to the other observation objects that are closest to it geographically. This would create an object graph that would look like a flat irregular net.

(2) Create methods for the observationRecords class that (a) determine if the record lays within the bounds of an arbitrary geographic square (b) ask if each of its releated record if they are also in the square (c) return its own species count and the count of all the related records.

(3) Divide your map into the some reasonable small squares e.g. one second of arc square. Within that square select one linked record and add it to a list. Choose some percentage of all records like 1 in every 100 or 1,000 so that you cut the list down from 500k to to create a sublist that can be quickly searched by brute force predicate. Let's call those records in the list the gridflags.

(4) When the user zooms in, use brute force to find all the gridflag records with the geographical grid. Then ask each gridflag record to send messages to each of its linked records to see if (a) they're inside the grid, (b) what their species count is and (c) what the count is for their linked records that are also within the grid. (Use a flag to make sure each record is only queried once per search to prevent runaway recursion.)

This way, you only have to find one record inside each arbitrarily sized grid cell and that record will find all the other records for you. Instead of stepping through each record to see which record goes in what cell every time, you just have to process the records in each cell and those immediately adjacent. As you zoom in, the number of records you actually query shrinks instead of remaining constant. If a grid cell has only a handful of records, you only have to query a handful of records.

This would take some effort and time to set up but once you did it would be rather efficient especially when zoomed way in. For the top level, just have a preprocessed static map.

Hope I explained that well enough. It's hard to convey verbally.

TechZen
I use core data in project, but raw sql query more faster.The probles is box coordinates and size depend on the map region, that change user.
Sergey Zenchenko