views:

78

answers:

2

Running the following code against a large PostgreSQL table, the NpgsqlDataReader object blocks until all data is fetched.

NpgsqlCommand cmd = new NpgsqlCommand(strQuery, _conn);
NpgsqlDataReader reader = cmd.ExecuteReader(); // <-- takes 30 seconds

How can I get it to behave such that it doesn't prefetch all the data? I want to step through the resultset row by row without having it fetch all 15 GB into memory at once.

I know there were issues with this sort of thing in Npgsql 1.x but I'm on 2.0. This is against a PostgreSQL 8.3 database on XP/Vista/7. I also don't have any funky "force Npgsql to prefetch" stuff in my connection string. I'm at a complete loss for why this is happening.

+1  A: 

I'm surprised the driver doesn't provide a way to do this-- but you could manually execute the SQL statements to declare a cursor, open it and fetch from it in batches. i.e. (and this code is very dubious as I'm not a C# guy):

new PgsqlCommand("DECLARE cur_data NO SCROLL CURSOR AS "
                 + strQuery, _conn).ExecuteNonQuery();
do {
   NpgsqlDataReader reader = new NpgsqlCommand("FETCH 100 FROM cur_data", _conn)
                                           .ExecuteReader();
   int rows = 0;
   // read data from reader, incrementing "rows" for each row
} while (rows > 0);
new PgsqlCommand("CLOSE cur_data", _conn).ExecuteNonQuery();

Note that:

  • You need to be inside a transaction block to use a cursor, unless you specify the "HOLD" option when declaring it, in which case the server will spool the results to a server-side temp file (you just won't have to transfer it all at once though)
  • The cursor_tuple_fraction setting may cause a different plan to be used when executing a query via a cursor as opposed to in immediate mode. You may want to do "SET cursor_tuple_fraction=1" just before declaring the cursor since you're actually intending to fetch all the cursor's output.
araqnid
I'd upvote but I don't have the reputation yet. My backup plan is to declare a cursor manually like you suggest. Ideally though, I'd like the DataReader to "just work". :) Isn't the whole point of a DataReader that it's a fast forward-only way of accessing data? So this behavior seems strange to me.
Swingline Rage
@Swingline Rage: it's possible that the .net driver is simply less mature than even the JDBC driver. The postgresql protocol has a mechanism, I believe, for fetching any query's results using a cursor (the SQL commands are a separate, higher-level interface)- maybe Npgsql simply hasn't had support for using that written yet? Istr that the traditional PQexecutequery() call does in fact simply return the entire result in one go, and presumably trying to leave that in the OS buffer and process it peicemeal would be bad...
araqnid
Yeah good point (and somewhat depressing). I'll look into this in more detail. I keep forgetting Npgsql is open-source so the answers are there. But the last time I looked I found the code difficult to follow, as always with data internals.
Swingline Rage
A: 

Hi!

Which Npgsql version are you using? We added support for large tables a while ago. In fact, Postgresql protocol version 3 has support for paging through large resultsets without using cursors. Unfortunately we didn't implement it yet. Sorry for that.

Please, give it a try with Npgsql 2.0.9 and let me know if you still have problems.

Francisco