The following query returns data right away:
SELECT time, value from data order by time limit 100;
Without the limit clause, it takes a long time before the server starts returning rows:
SELECT time, value from data order by time;
I observe this both by using the query tool (psql
) and when querying using an API.
Questions/issues:
- The amount of work the server has to do before starting to return rows should be the same for both select statements. Correct?
- If so, why is there a delay in case 2?
- Is there some fundamental RDBMS issue that I do not understand?
- Is there a way I can make postgresql start returning result rows to the client without pause, also for case 2?
- EDIT (see below). It looks like
setFetchSize
is the key to solving this. In my case I execute the query from python, using SQLAlchemy. How can I set that option for a single query (executed bysession.execute
)? I use the psycopg2 driver.
The column time
is the primary key, BTW.
EDIT:
I believe this excerpt from the JDBC driver documentation describes the problem and hints at a solution (I still need help - see the last bullet list item above):
By default the driver collects all the results for the query at once. This can be inconvenient for large data sets so the JDBC driver provides a means of basing a ResultSet on a database cursor and only fetching a small number of rows.
and
Changing code to cursor mode is as simple as setting the fetch size of the Statement to the appropriate size. Setting the fetch size back to 0 will cause all rows to be cached (the default behaviour).
// make sure autocommit is off
conn.setAutoCommit(false);
Statement st = conn.createStatement();
// Turn use of the cursor on.
st.setFetchSize(50);