views:

702

answers:

2

i have a postgres database with millions of rows in it it has a column called geom which contains the boundary of a property.

using a python script i am extracting the information from this table and re-inserting it into a new table.

when i insert in the new table the script bugs out with the following:

Traceback (most recent call last):
  File "build_parcels.py", line 258, in <module>
    main()
  File "build_parcels.py", line 166, in main
    update_cursor.executemany("insert into parcels (par_id, street_add, title_no, proprietors, au_name, ua_name, geom) VALUES (%s, %s, %s, %s, %s, %s, %s)", inserts)
psycopg2.IntegrityError: new row for relation "parcels" violates check constraint "enforce_geotype_geom"

The new table has a check constraint enforce_geotype_geom = ((geometrytype(geom) = 'POLYGON'::text) OR (geom IS NULL)) whereas the old table does not, so im guessing theres dud data or non polygon (perhaps multipolygon data?) in the old table. i want to keep the new data as polygon so want to not insert anything else.

Initially i tried wrapping the query with standard python error handling with the hope that the dud geom rows would fail but the script would keep running , but the script has been written to commit at the end not each row so it doesnt work.

I think what i need to do is iterate through the old table geom rows and check what type of geometry they are so i can establish whether or not i want to keep it or throw it away before i insert into the new table

Whats the best way of going about this?

+2  A: 

Option 1 is to create a savepoint before each insert and roll back to that safepoint if an INSERT fails.

Option 2 is to attach the check constraint expression as a WHERE condition on the original query that produced the data to avoid selecting it at all.

The best answer depends on the size of the tables, the relative number of faulty rows, and how fast and often this is supposed to run.

Peter Eisentraut
Thanks for the answer. I like the option 2 but i still need the other data inserted even if the geom doesnt get inserted. Do you know how i can just do a select statement that prints the geom type for each row?
ADAM
and to clarify the database has about 5million rows and this is only run 1 time per month to regenerate the data and doesnt need to be fast. i dont yet know the amount of faulty rows
ADAM
You could do the original query (as per option 2) likeSELECT par_id, street_add, title_no, proprietors, au_name, ua_name, CASE WHEN ((geometrytype(geom) = 'POLYGON'::text) OR (geom IS NULL)) THEN geom ELSE null END AS geom FROM oldtable;to substitute null for geom values that don't "fit".
Peter Eisentraut
thanks for the answer peter i used the geometrytype(geom) but similar to answer above
ADAM
+1  A: 

This astonishingly useful bit of PostGIS SQL should help you figure it out... there are many geometry type tests in here:

-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-- 
-- $Id: cleanGeometry.sql 2008-04-24 10:30Z Dr. Horst Duester $
--
-- cleanGeometry - remove self- and ring-selfintersections from 
--                 input Polygon geometries 
-- http://www.sogis.ch
-- Copyright 2008 SO!GIS Koordination, Kanton Solothurn, Switzerland
-- Version 1.0
-- contact: horst dot duester at bd dot so dot ch
--
-- This is free software; you can redistribute and/or modify it under
-- the terms of the GNU General Public Licence. See the COPYING file.
-- This software is without any warrenty and you use it at your own risk
--  
-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -


CREATE OR REPLACE FUNCTION cleanGeometry(geometry)
  RETURNS geometry AS
$BODY$DECLARE
  inGeom ALIAS for $1;
  outGeom geometry;
  tmpLinestring geometry;

Begin

  outGeom := NULL;

-- Clean Process for Polygon 
  IF (GeometryType(inGeom) = 'POLYGON' OR GeometryType(inGeom) = 'MULTIPOLYGON') THEN

-- Only process if geometry is not valid, 
-- otherwise put out without change
    if not isValid(inGeom) THEN

-- create nodes at all self-intersecting lines by union the polygon boundaries
-- with the startingpoint of the boundary.  
      tmpLinestring := st_union(st_multi(st_boundary(inGeom)),st_pointn(boundary(inGeom),1));
      outGeom = buildarea(tmpLinestring);      
      IF (GeometryType(inGeom) = 'MULTIPOLYGON') THEN      
        RETURN st_multi(outGeom);
      ELSE
        RETURN outGeom;
      END IF;
    else    
      RETURN inGeom;
    END IF;


------------------------------------------------------------------------------
-- Clean Process for LINESTRINGS, self-intersecting parts of linestrings 
-- will be divided into multiparts of the mentioned linestring 
------------------------------------------------------------------------------
  ELSIF (GeometryType(inGeom) = 'LINESTRING') THEN

-- create nodes at all self-intersecting lines by union the linestrings
-- with the startingpoint of the linestring.  
    outGeom := st_union(st_multi(inGeom),st_pointn(inGeom,1));
    RETURN outGeom;
  ELSIF (GeometryType(inGeom) = 'MULTILINESTRING') THEN 
    outGeom := multi(st_union(st_multi(inGeom),st_pointn(inGeom,1)));
    RETURN outGeom;
  ELSIF (GeometryType(inGeom) = '<NULL>' OR GeometryType(inGeom) = 'GEOMETRYCOLLECTION') THEN 
    RETURN NULL;
  ELSE 
    RAISE NOTICE 'The input type % is not supported %',GeometryType(inGeom),st_summary(inGeom);
    RETURN inGeom;
  END IF;     
End;$BODY$
  LANGUAGE 'plpgsql' VOLATILE;
Andrew McGregor
thanks, would have settled for a hybrid python/postgres answer but this is amazing to be able to do it all inside postgres. thanks for your answer
ADAM