views:

139

answers:

6

I often write little Python scripts to iterate through all rows of a DB-table. For example sending all to all subscribers a email.

I do it like this

conn = MySQLdb.connect(host = hst, user = usr, passwd = pw, db = db)
cursor = conn.cursor()
subscribers = cursor.execute("SELECT * FROM tbl_subscriber;")

for subscriber in subscribers:
 ...

conn.close()

I wonder if there is a better way to do this cause it is possible that my code loads thousands of rows into the memory.

I thought about that it could be done better with LIMIT. Maybe something like that:

"SELECT * FROM tbl_subscriber LIMIT %d,%d;" % (actualLimit,steps)    

Whats the best way to do it? How would you do it?

+1  A: 

First of all maybe you don't need Select * from...

maybe it's enough for you just to get some stuff like: "SELECT email from..."

that would decrease the amount of memory usage anyway:)

Katalonis
Great point - `SELECT *` Considered Harmful, or at least lazy programming.
Piskvor
A: 

Wouldn't it be good, if you fetched one row at a time like this:

class Table:
   def __init__(self, db, name):
      self.db = db
      self.name = name
      self.dbc = self.db.cursor()

   def __getitem__(self, item):
      self.dbc.execute("select * from %s limit %s, 1" % 
         (self.name, item))
      return self.dbc.fetchone()

   def __len__(self):
      self.dbc.execute("select count(*) from %s" % (self.name))
      r = int(self.dbc.fetchone()[0])
      return r


import MySQLdb
db = MySQLdb.connect(user="user", passwd="passwd", db=db)
subscribers = Table(db, "name")
for i in xrange(len(subscribers)):
   print subscribers[i] 
pyfunc
This is overkill. Iterating over a cursor does fetch one row at a time.
Space_C0wb0y
Yeah got that, Thanks! This is truly overdoing it.
pyfunc
+3  A: 

unless you have BLOBs in there, thousands of rows shouldn't be a problem. Do you know that it is?

Also, why bring shame on yourself and your entire family by doing something like

"SELECT * FROM tbl_subscriber LIMIT %d,%d;" % (actualLimit,steps)

when the cursor will make the substitution for you in a manner that avoids SQL injection?

c.execute("SELECT * FROM tbl_subscriber LIMIT %i,%i;", (actualLimit,steps))
aaronasterling
@AaronMcSmooth : +1 Sane advice that I overlooked myself
pyfunc
@pyfunc. I've written too much PHP in my life. I can't look at insecure code without cringing.
aaronasterling
@AaronMcSmooth : Thank you for your advice! I didn't know that execute(...) is able to avoid SQL injection. However it is a script for local use.
OemerA
Good stuff, learned something new. Thanks!
Hagge
+1  A: 

Do you have actual memory problems? When iterating over a cursor, results are fetched one at a time (your DB-API implementation might decide to prefetch results, but then it might offer a function to set the number of prefetched results).

Space_C0wb0y
A: 

Most MySQL connectors based on libmysqlclient will buffer all the results in client memory by default for performance reasons (with the assumption you won't be reading large resultsets).

When you do need to read a large result in MySQLdb you can use a SSCursor to avoid buffering entire large resultsets.

http://mysql-python.sourceforge.net/MySQLdb.html#using-and-extending

SSCursor - A "server-side" cursor. Like Cursor but uses CursorUseResultMixIn. Use only if you are dealing with potentially large result sets.

This does introduce complications that you must be careful of. If you don't read all the results from the cursor, a second query will raise an ProgrammingError:

>>> import MySQLdb
>>> import MySQLdb.cursors
>>> conn = MySQLdb.connect(read_default_file='~/.my.cnf')
>>> curs = conn.cursor(MySQLdb.cursors.SSCursor)
>>> curs.execute('SELECT * FROM big_table')
18446744073709551615L
>>> curs.fetchone()
(1L, '2c57b425f0de896fcf5b2e2f28c93f66')
>>> curs.execute('SELECT NOW()')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.6/site-packages/MySQLdb/cursors.py", line 173, in execute
    self.errorhandler(self, exc, value)
  File "/usr/lib64/python2.6/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
    raise errorclass, errorvalue
_mysql_exceptions.ProgrammingError: (2014, "Commands out of sync; you can't run this command now")

This means you have to always read everything from the cursor (and potentially multiple resultsets) before issuing another - MySQLdb won't do this for you.

Andrew
+1  A: 

You don't have to modify the query, you can use the fetchmany method of cursors. Here is how I do it :

def fetchsome(cursor, some=1000):
    fetch=cursor.fetchmany
    while True:
        rows=fetch(some)
        if not rows: break
        for row in rows:
            yield row  

This way you can "SELECT * FROM tbl_subscriber;" but you will only fetch some at a time.

dugres