views:

646

answers:

3

Hi,

I'm looking for the most efficient way to bulk-insert some millions of tuples into a database. I'm using Python, PostgreSQL and psycopg2.

I have created a long list of tulpes that should be inserted to the database, sometimes with modifiers like geometric Simplify.

The naïve way to do it would be string-formatting a list of INSERT statements, but there are three other methods I've read about:

  1. Using pyformat binding style for parametric insertion
  2. Using executemany on the list of tuples, and
  3. Using writing the results to a file and using COPY.

It seems that the first way is the most efficient, but I would appreciate your insights and code snippets telling me how to do it right.

Thanks,

Adam

+1  A: 

The first and the second would be used together, not separately. The third would be the most efficient server-wise though, since the server would do all the hard work.

Ignacio Vazquez-Abrams
Can you link to some code samples? I can't find any good psycopg2 resources on the web.
Adam Matan
Psycopg has a new manual: there's plenty of examples on it now. http://initd.org/psycopg/
piro
Great, thanks. found another good example here: http://www.devx.com/opensource/Article/29071/0/page/3 , probably the best hands-on resource on psycopg2 there is.
Adam Matan
+4  A: 

Yeah, I would vote for COPY, providing you can write a file to the server's hard drive (not the drive the app is running on) as COPY will only read off the server.

Andy Shellam
Using psycopg2's cursor.copy_from the file is handled by the client. It doesn't even need to be a file system file: any python file-like object works fine. Check http://initd.org/psycopg/docs/cursor.html#cursor.copy_from
piro
In that case it would be interesting to see how it's actually inserted into the database - I was under the impression PostgreSQL's COPY only read from the server's local file system and there was no way to bulk copy a file across using the client.
Andy Shellam
It can also read from `STDIN`, which mean data come from the client application. See copy command docs: http://www.postgresql.org/docs/8.4/static/sql-copy.html
piro
Using `copy` server-side is hard to beat for speed.
Avery Payne
@piro: Cool, good to know - +1
Andy Shellam
+1  A: 

There is a new psycopg2 manual containing examples for all the options.

The COPY option is the most efficient. Then the executemany. Then the execute with pyformat.

piro