views:

218

answers:

3

Hello stackies. The following PiP search was built for a project that lets users find their NYC governmental districts by address or lat/lng (http://staging.placeanddisplaced.org). It works, but its kinda slow, especially when searching through districts that have complex polygons. Can anyone give me some pointers on optimizing this code?

One thought I had was to run the point_in_polygon? method on a simplified version of each polygon, i.e. fewer coordinates. This would mean less processing time, but also decreased accuracy.. thoughts?

class DistrictPolygonsController < ApplicationController

  def index

    ...

    if coordinates?
      @district_polygons = DistrictPolygon.
      coordinates_within_bounding_box(params[:lat], params[:lng]).
      find(:all, :include => :district, :select => select).
      select { |dp| dp.contains_coordinates?(params[:lat], params[:lng]) }
    else
      @district_polygons = DistrictPolygon.find(:all, :include => :district, :select => select)
    end

    ...

  end

end

class DistrictPolygon < ActiveRecord::Base

  named_scope :coordinates_within_bounding_box, lambda { |lat,lng| { :conditions => ['min_lat<? AND max_lat>? AND min_lng<? AND max_lng>?', lat.to_f, lat.to_f, lng.to_f, lng.to_f] } }
  named_scope :with_district_type, lambda { |t| { :conditions => ['district_type=?', t] } }

  before_save :get_bounding_box_from_geometry

  def get_bounding_box_from_geometry
    # return true unless self.new_record? || self.geometry_changed? || self.bounds_unknown?
    self.min_lat = all_lats.min
    self.max_lat = all_lats.max
    self.min_lng = all_lngs.min
    self.max_lng = all_lngs.max
    true # object won't save without this return
  end

  def bounds_unknown?
    %w(min_lat max_lat min_lng max_lng).any? {|b| self[b.to_sym].blank? }
  end

  def bounds_known?; !bounds_unknown?; end

  # Returns an array of XML objects containing each polygons coordinates
  def polygons
    Nokogiri::XML(self.geometry).search("Polygon/outerBoundaryIs/LinearRing/coordinates")
  end

  def multi_geometric?
    Nokogiri::XML(self.geometry).search("MultiGeometry").size > 0
  end

  # Returns an array of [lng,lat] arrays
  def all_coordinates
    pairs = []
    polygons.map do |polygon|
      polygon.content.split("\n").map do |coord|
        # Get rid of third 'altitude' param from coordinate..
        pair = coord.strip.split(",")[0..1].map(&:to_f)
        # Don't let any nils, blanks, or zeros through..
        pairs << pair unless pair.any? {|point| point.blank? || point.zero? }
      end
    end
    pairs
  end

  # All latitudes, regardless of MultiPolygonal geometry
  def all_lats
    all_coordinates.map(&:last).reject{|lat| lat.blank? || lat.zero?}    
  end

  # All longitudes, regardless of MultiPolygonal geometry  
  def all_lngs
    all_coordinates.map(&:first).reject{|lng| lng.blank? || lng.zero?}
  end

  # Check to see if coordinates are in the rectangular bounds of this district
  def contains_coordinates?(lat, lng)
    return false unless coordinates_within_bounding_box?(lat.to_f, lng.to_f)
    polygons.any? { |polygon| DistrictPolygon.point_in_polygon?(all_lats, all_lngs, lat.to_f, lng.to_f) }
  end

  def coordinates_within_bounding_box?(lat, lng)
    return false if (max_lat > lat.to_f == min_lat > lat.to_f) # Not between lats
    return false if (max_lng > lng.to_f == min_lng > lng.to_f) # Not between lngs
    true
  end

  # This algorithm came from http://www.ecse.rpi.edu/Homepages/wrf/Research/Short_Notes/pnpoly.html
  def self.point_in_polygon?(x_points, y_points, x_target, y_target)
    num_points = x_points.size
    j = num_points-1
    c = false
    for i in 0...num_points do
      c = !c if ( ((y_points[i]>y_target) != (y_points[j]>y_target)) && (x_target < (x_points[j]-x_points[i]) * (y_target-y_points[i]) / (y_points[j]-y_points[i]) + x_points[i]) )
      j = i
    end
    return c
  end

end
+1  A: 

If your runtime is longer for more complex shapes, it suggests the performance is in the O(n) loop in the point_in_polygon?

Does profiling back that assumption up?

If performance is critical, consider implementing the exact same algorithm as native code.

Will
A: 

I suspect you may be able to push the majority of the work into the DB. PostgreSQL has the PostGIS plugin which enables spatially aware queries to be performed.

PostGIS: http://postgis.refractions.net/ Docs: http://postgis.refractions.net/documentation/manual-1.4/

This breaks the database portability concept, but might be worth it if performance is critical.

ctcherry
A: 

Algorithm aside, keeping the polygon data in local memory and recoding this in a statically typed compiled language will likely lead to 100x-1000x increase in speed.

Tristan