tags:

views:

209

answers:

1

I want to store test results in the form of a .CSV file into a central database. The DB has a schema that does not match the CSV file. Ideally, I would just pass the contents of the CSV file as a string directly to the server and have a function that can transform the CSV into a table that can be joined and used in an INSERT.

Here's a simplified example:

INSERT INTO MyTable
SELECT *
FROM parse_csv(csv_data) t
INNER JOIN Categories c
ON t.CatID=c.CatID

Note: csv_data is a varchar with the contents of a CSV file. Also notice the INNER JOIN operation done directly on the output of parse_csv. What's a quick way to write the parse_csv function() above?

I'm also thinking about using OPENXML and passing an XML string, but I can't find that function on Postgres. See this question.

I'd rather not parse the CSV file in the application code and call INSERT a thousand times. That could be a lot of unnecessary roundtrips. I know of the COPY function, but I have several CSV files of the same format that I don't want to collide.

I'm open to any suggestions or tips.

+1  A: 

It's possible as long as your CSV data have known format.

Let's assume they have 2 columns: "CatID" and "WhatEver".

Now you can create a pl/perlu function (or pl/pythonu, or with some luck even pl/pgsql, but it would be tricky):

create type parse_csv_srf as (catid int4, whatever text);
create function parse_csv(text) returns setof parse_csv_srf as $$
...
$$ language plperlu;

Body of the function is in Perl (or Python, or whatever) and does the parsing, plus returns (in case of Perl) reference to array of hashrefs, where each hash should have keys "catid" and "whatever" with some values.

Following example is just an example - writing csv parser with regexps is bad idea, and you should never do it, but I'm too lazy to provide actual working parser as part of example.

CREATE type parse_csv_srf as (catid INT4, whatever TEXT);
CREATE OR REPLACE FUNCTION parse_csv( TEXT ) RETURNS setof parse_csv_srf as $$
my $source = shift;
my @rows = split /\r?\n/, $source;
my @reply = ();
for my $row (@rows) {
    my @values = ();
    while ( $row =~ s/("(?:[^"]|"")*"|[^",]*)(,|$)// ) {
        my $single_value = $1;
        $single_value =~ s/^"//;
        $single_value =~ s/"$//;
        $single_value =~ s/""/"/g;
        push @values, $single_value;
        last if '' eq $row;
    }
    push @reply, {
        "catid" => $values[0],
        "whatever" => $values[1],
    };
}
return \@reply;
$$ language plperl;

# select * from parse_csv(E'1,depesz\n2,"hubert lubaczewski"\n');
 catid |      whatever
-------+--------------------
     1 | depesz
     2 | hubert lubaczewski
(2 rows)

# select i.*, c.relpages
  from parse_csv(E'1,pg_database\n2,"pg_proc"\n') as i
       join pg_class c on i.whatever = c.relname;
 catid |  whatever   | relpages
-------+-------------+----------
     1 | pg_database |        1
     2 | pg_proc     |       53
(2 rows)
depesz