views:

70

answers:

1

Hi,

Consider the following code in Python, using psycopg2 cursor object (Some column names were changed or omitted for clarity):

filename='data.csv'
file_columns=('id', 'node_id', 'segment_id', 'elevated', 
              'approximation', 'the_geom', 'azimuth')
self._cur.copy_from(file=open(filename),
                    table=self.new_table_name, columns=file_columns)
  • The database is located on a remote machine on a fast LAN.
  • Using \COPY from bash works very fast, even for large (~1,000,000 lines) files.

This code is ultra-fast for 5,000 lines, but when data.csv grows beyond 10,000 lines, the program freezes completely.

Any thoughts \ solutions?

Adam

+1  A: 

This is just a workaround, but you can just pipe something into psql. I use this recipe sometimes when I am too lazy to bust out psycopg2

import subprocess
def psql_copy_from(filename, tablename, columns = None):
    """Warning, this does not properly quote things"""
    coltxt = ' (%s)' % ', '.join(columns) if columns else ''
    with open(filename) as f:
        subprocess.check_call([
            'psql',
            '-c', 'COPY %s%s FROM STDIN' % (tablename, coltxt),
            '--set=ON_ERROR_STOP=true', # to be safe
            # add your connection args here
        ], stdin=f)

As far as your locking up is concerned, are you using multiple threads or anything like that?

Is your postgres logging anything such as a closed connection or a deadlock? Can you see disk activity after it locks up?

mikelikespie
+1 Thanks, that's the solution I've used, with a slightly different syntax. I don't see any signs of deadlocks, and disk activity is normal (it's quite a busy server).
Adam Matan