views:

1713

answers:

4

Hi again,

I have been using python with RDBMS' (MySQL and PostgreSQL), and I have noticed that I really do not understand how to use a cursor.

Usually, one have his script connect to the DB via a client DB-API (like psycopg2 or MySQLdb):

connection = psycopg2.connect(host='otherhost', etc)

And then one creates a cursor:

cursor = connection.cursor()

And then one can issue queries and commands:

cursor.execute("SELECT * FROM etc")

Now where is the result of the query, I wonder? is it on the server? or a little on my client and a little on my server? And then, if we need to access some results, we fetch 'em:

rows = cursor.fetchone()

or

rows = cursor.fetchmany()

Now lets say, I do not retrieve all the rows, and decide to execute another query, what will happen to the previous results? Is their an overhead.

Also, should I create a cursor for every form of command and continuously reuse it for those same commands somehow; I head psycopg2 can somehow optimize commands that are executed many times but with different values, how and is it worth it?

Thx

+1  A: 

Assuming you're using PostgreSQL, the cursors probably are just implemented using the database's native cursor API. You may want to look at the source code for pg8000, a pure Python PostgreSQL DB-API module, to see how it handles cursors. You might also like to look at the PostgreSQL documentation for cursors.

kquinn
A: 

When you look here at the mysqldb documentation you can see that they implemented different strategies for cursors. So the general answer is: it depends.

Edit: Here is the mysqldb API documentation. There is some info how each cursor type is behaving. The standard cursor is storing the result set in the client. So I assume there is a overhead if you don't retrieve all result rows, because even the rows you don't fetch have to be transfered to the client (potentially over the network). My guess is that it is not that different from postgresql.

When you want to optimize SQL statements that you call repeatedly with many values, you should look at cursor.executemany(). It prepares a SQL statement so that it doesn't need to be parsed every time you call it:

cur.executemany('INSERT INTO mytable (col1, col2) VALUES (%s, %s)',
                [('val1', 1), ('val2', 2)])
unbeknown
+4  A: 

Wikipedia on cursors: "In database packages, a cursor is a control structure for the successive traversal (and potential processing) of records in a result set."

Once you have executed a query (execute, executemany) you can fetch one (fetchone), many (fetchmany; using the size parameter), or all (fetchall) of the resulting rows.

From PEP 249: "Note there are performance considerations involved with the size parameter. For optimal performance, it is usually best to use the arraysize attribute."

If you are running a bunch of similar queries with different parameters you can optimize by reusing the same cursor. Again, from PEP 249:

A reference to the operation will be retained by the cursor. If the same operation object is passed in again, then the cursor can optimize its behavior. This is most effective for algorithms where the same operation is used, but different parameters are bound to it (many times).

Adam Bernier
So a cursor is an iterator over the result set.
monk.e.boy
+1  A: 

ya, i know it's months old :P

DB-API's cursor appears to be closely modeled after SQL cursors. AFA resource(rows) management is concerned, DB-API does not specify whether the client must retrieve all the rows or DECLARE an actual SQL cursor. As long as the fetchXXX interfaces do what they're supposed to, DB-API is happy.

AFA psycopg2 cursors are concerned(as you may well know), "unnamed DB-API cursors" will fetch the entire result set--AFAIK buffered in memory by libpq. "named DB-API cursors"(a psycopg2 concept that may not be portable), will request the rows on demand(fetchXXX methods).

As cited by "unbeknown", executemany can be used to optimize multiple runs of the same command. However, it doesn't accommodate for the need of prepared statements; when repeat executions of a statement with different parameter sets is not directly sequential, executemany() will perform just as well as execute(). DB-API does "provide" driver authors with the ability to cache executed statements, but it's implementation(what's the scope/lifetime of the statement?) is undefined, so it's impossible to set expectations across DB-API implementations.

If you are loading lots of data into PostgreSQL, I would strongly recommend trying to find a way to use COPY.

jwp