views:

23

answers:

1

I'm experimenting with using the pg_bulkload project to import millions of rows of data into a database. However, none of the new rows have a primary key and only two of several columns are avalable in my input file. How do I tell pg_bulkload which columns I'm importing and how do I generate the primary key field? Do I need to edit my import file to match exactly what the output of a COPY command would be and generate the id field myself?

For example, lets say my database columns might be:

id         title        body        published

The data that I have is limited to title and published and are listed in a tab delimited file. My .ctl file looks like this:

TABLE = posts
INFILE = stdin
TYPE = CSV
DELIMITER = "   "
+1  A: 

You can use FILTER functionality of pg_loader. Something like:

In database

CREATE FUNCTION pg_bulkload_filter(text, text) RETURNS record
AS $$
  SELECT nextval('tablename_id_seq'), NULL, NULL, $1, $2, NULL
$$ LANGUAGE SQL;

And in pg_bulkload control file:

FILTER = pg_bulkload_filter

Tometzky
This does the trick. Looking back, it is in the documentation but it isn't too clear. Also, I had to cast everything, even the NULL values, to the appropriate types. Thanks for your help.
thetaiko