ansaurus

Question

Why does not postgresql start returning rows immediately?

Answer 1

A:

In theory, because your ORDER BY is by primary key, a sort of the results should not be necessary, and the DB could indeed return data right away in key order.

I would expect a capable DB of noticing this, and optimizing for it. It seems that PGSQL is not. * shrug *

You don't notice any impact if you have LIMIT 100 because it's very quick to pull those 100 results out of the DB, and you won't notice any delay if they're first gathered up and sorted before being shipped out to your client.

I suggest trying to drop the ORDER BY. Chances are, your results will be correctly ordered by time anyway (there may even be a standard or specification that mandates this, given your PK), and you might get your results more quickly.

Carl Smotricz 2010-01-08 12:59:36

I believe you might have misread my question. With the LIMIT, the database returns those rows right away. Without the limit, there is a pause before the first rows are returned to the client.

codeape 2010-01-08 13:05:52

Databases can have different query plans if the optimizer knows you are interested in results as fast as possible. Oracle and DB2 both have options to do that. Maybe the limit clause kicks in the Postgresql hint that the query wants results immediately?

Ken Fox 2010-01-08 13:06:11

Yes, I did misunderstand your question at first, that's why I revamped the whole thing. Please look at my updated answer now!

Carl Smotricz 2010-01-08 13:08:24

Dropping the ORDER BY makes no difference. In fact, I now believe that the problem is in the client driver. It looks as if the driver by default collects all the results for the query at once (see my edit).

codeape 2010-01-08 13:22:53

Ah well, it looks like they recognize the problem and helpfully offer a fix as well. If you're using `PreparedStatement` to submit your queries you can implement their suggestion directly (using 100 if you like), otherwise you should be able to use a `Statement` to submit your query too, and do the setXXX() call on that.

Carl Smotricz 2010-01-08 13:32:33

Oh... Reminder: If autoCommit is turned off, you'll have to explicitly `conn.commit()` if you do any updates.

Carl Smotricz 2010-01-08 13:33:43

One small problem: I work in Python. The solution is for a Java driver. Haven't been able to figure out how to do it in Python yet.

codeape 2010-01-08 13:38:34

Me, I don't know Python. I'd recommend writing up a new SO question for the subset of info you need, perhaps something like "How do I set the fetch size for PGSQL in Python?"

Carl Smotricz 2010-01-08 13:48:19

Yes, good idea.

codeape 2010-01-08 13:55:58

Answer 2

+3 A:

The psycopg2 dbapi driver buffers the whole query result before returning any rows. You'll need to use server side cursor to incrementally fetch results. For SQLAlchemy see server_side_cursors in the docs and if you're using the ORM the Query.yield_per() method.

SQLAlchemy currently doesn't have an option to set that per single query, but there is a ticket with a patch for implementing that.

Ants Aasma 2010-01-08 14:52:25

I tried using a server-side cursor: c = conn.cursor("mycursor"); c.execute("..."); c.fetchmany(100). But still I get the long delay before something is returned. What am I doing wrong?

codeape 2010-01-08 15:05:57

Assuming conn is a psycopg2 connection, I have no idea, works correctly for me. You can try executing EXPLAIN ANALYZE for the same query and look at the first time number for the first row in the explain output, that is the time postgresql took to find the first row.

Ants Aasma 2010-01-08 15:19:47

ansaurus

tags:

views:

answers:

Why does not postgresql start returning rows immediately?

related questions