tags:

views:

573

answers:

3

I've got a production DB with, say, ten million rows. I'd like to extract the 10,000 or so rows from the past hour off of production and copy them to my local box. How do I do that?

Let's say the query is:

SELECT * FROM mytable WHERE date > '2009-01-05 12:00:00';

How do I take the output, export it to some sort of dump file, and then import that dump file into my local development copy of the database -- as quickly and easily as possible?

+2  A: 

From within psql, you just use copy with the query you gave us, exporting this as a CSV (or whatever format), switch database with \c and import it.

Look into \h copy in psql.

Keltia
I get this: ERROR: must be superuser to COPY to or from a file... is there any way around it, assuming I'm not going to get to run arbitrary code as the superuser?
mike
You should edit the original question to add this restriction, to avoid answers like Michael Buen's.
bortzmeyer
A: 

With the constraint you added (not being superuser), I do not find a pure-SQL solution. But doing it in your favorite language is quite simple. You open a connection to the "old" database, another one to the new database, you SELECT in one and INSERT in the other. Here is a tested-and-working solution in Python.

 #!/usr/bin/python

""" 

Copy a *part* of a database to another one. See
<http://stackoverflow.com/questions/414849/whats-the-best-way-to-copy-a-subset-of-a-tables-rows-from-one-database-to-anoth&gt;

With PostgreSQL, the only pure-SQL solution is to use COPY, which is
not available to the ordinary user.

Stephane Bortzmeyer <[email protected]>

"""

table_name = "Tests"
# List here the columns you want to copy. Yes, "*" would be simpler
# but also more brittle.
names = ["id", "uuid", "date", "domain", "broken", "spf"]
constraint = "date > '2009-01-01'"

import psycopg2

old_db = psycopg2.connect("dbname=dnswitness-spf")
new_db = psycopg2.connect("dbname=essais")
old_cursor = old_db.cursor()
old_cursor.execute("""SET TRANSACTION READ ONLY""") # Security
new_cursor = new_db.cursor()
old_cursor.execute("""SELECT %s FROM %s WHERE %s """ % \
                       (",".join(names), table_name, constraint))
print "%i rows retrieved" % old_cursor.rowcount
new_cursor.execute("""BEGIN""")
placeholders = []
namesandvalues = {}
for name in names:
    placeholders.append("%%(%s)s" % name)
for row in old_cursor.fetchall():
    i = 0
    for name in names:
        namesandvalues[name] = row[i]
        i = i + 1
    command = "INSERT INTO %s (%s) VALUES (%s)" % \
              (table_name, ",".join(names), ",".join(placeholders))
    new_cursor.execute(command, namesandvalues)
new_cursor.execute("""COMMIT""")
old_cursor.close()
new_cursor.close()
old_db.close()
new_db.close()
bortzmeyer
+1  A: 

source server:

BEGIN;

CREATE TEMP TABLE mmm_your_table_here AS
    SELECT * FROM your_table_here WHERE your_condition_here;

COPY mmm_your_table_here TO 'u:\\source.copy';

ROLLBACK;

your local box:

-- your_destination_table_here must be created first on your box

COPY your_destination_table_here FROM 'u:\\source.copy';

article: http://www.postgresql.org/docs/8.1/static/sql-copy.html

Michael Buen
The OP explicitely said (in a comment to Keltia's answer) that he is not superuser so COPY is not an option).
bortzmeyer