I am doing a bulk insert of records into a database from a log file. Occasionally (~1 row out of every thousand) one of the rows violates the primary key and causes the transaction to fail. Currently, the user has to manually go through the file that caused the failure and remove the offending row before attempting to re-import. Given that there are hundreds of these files to import it is impractical.
My question: How can I skip the insertion of records that will violate the primary key constraint, without having to do a SELECT
statement before each row to see if it already exists?
Note: I am aware of the very similar question #1054695, but it appears to be a SQL Server specific answer and I am using PostgreSQL (importing via Python/psycopg2).