ansaurus

Question

Best way to iterate through all rows in a DB-table

Answer 1

+1 A:

First of all maybe you don't need Select * from...

maybe it's enough for you just to get some stuff like: "SELECT email from..."

that would decrease the amount of memory usage anyway:)

Katalonis 2010-09-24 08:20:01

Great point - `SELECT *` Considered Harmful, or at least lazy programming.

Piskvor 2010-09-24 08:26:29

Answer 2

A:

Wouldn't it be good, if you fetched one row at a time like this:

class Table:
   def __init__(self, db, name):
      self.db = db
      self.name = name
      self.dbc = self.db.cursor()

   def __getitem__(self, item):
      self.dbc.execute("select * from %s limit %s, 1" % 
         (self.name, item))
      return self.dbc.fetchone()

   def __len__(self):
      self.dbc.execute("select count(*) from %s" % (self.name))
      r = int(self.dbc.fetchone()[0])
      return r


import MySQLdb
db = MySQLdb.connect(user="user", passwd="passwd", db=db)
subscribers = Table(db, "name")
for i in xrange(len(subscribers)):
   print subscribers[i]

pyfunc 2010-09-24 08:23:08

This is overkill. Iterating over a cursor does fetch one row at a time.

Space_C0wb0y 2010-09-24 08:25:16

Yeah got that, Thanks! This is truly overdoing it.

pyfunc 2010-09-24 08:26:52

Answer 3

+3 A:

unless you have BLOBs in there, thousands of rows shouldn't be a problem. Do you know that it is?

Also, why bring shame on yourself and your entire family by doing something like

"SELECT * FROM tbl_subscriber LIMIT %d,%d;" % (actualLimit,steps)

when the cursor will make the substitution for you in a manner that avoids SQL injection?

c.execute("SELECT * FROM tbl_subscriber LIMIT %i,%i;", (actualLimit,steps))

aaronasterling 2010-09-24 08:23:31

@AaronMcSmooth : +1 Sane advice that I overlooked myself

pyfunc 2010-09-24 08:25:12

@pyfunc. I've written too much PHP in my life. I can't look at insecure code without cringing.

aaronasterling 2010-09-24 08:27:05

@AaronMcSmooth : Thank you for your advice! I didn't know that execute(...) is able to avoid SQL injection. However it is a script for local use.

OemerA 2010-09-24 08:43:01

Good stuff, learned something new. Thanks!

Hagge 2010-09-28 07:50:29

Answer 4

+1 A:

Do you have actual memory problems? When iterating over a cursor, results are fetched one at a time (your DB-API implementation might decide to prefetch results, but then it might offer a function to set the number of prefetched results).

Space_C0wb0y 2010-09-24 08:26:43

Answer 5

A:

Most MySQL connectors based on libmysqlclient will buffer all the results in client memory by default for performance reasons (with the assumption you won't be reading large resultsets).

When you do need to read a large result in MySQLdb you can use a SSCursor to avoid buffering entire large resultsets.

http://mysql-python.sourceforge.net/MySQLdb.html#using-and-extending

SSCursor - A "server-side" cursor. Like Cursor but uses CursorUseResultMixIn. Use only if you are dealing with potentially large result sets.

This does introduce complications that you must be careful of. If you don't read all the results from the cursor, a second query will raise an ProgrammingError:

>>> import MySQLdb
>>> import MySQLdb.cursors
>>> conn = MySQLdb.connect(read_default_file='~/.my.cnf')
>>> curs = conn.cursor(MySQLdb.cursors.SSCursor)
>>> curs.execute('SELECT * FROM big_table')
18446744073709551615L
>>> curs.fetchone()
(1L, '2c57b425f0de896fcf5b2e2f28c93f66')
>>> curs.execute('SELECT NOW()')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.6/site-packages/MySQLdb/cursors.py", line 173, in execute
    self.errorhandler(self, exc, value)
  File "/usr/lib64/python2.6/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
    raise errorclass, errorvalue
_mysql_exceptions.ProgrammingError: (2014, "Commands out of sync; you can't run this command now")

This means you have to always read everything from the cursor (and potentially multiple resultsets) before issuing another - MySQLdb won't do this for you.

Andrew 2010-09-24 15:48:54

Answer 6

+1 A:

You don't have to modify the query, you can use the fetchmany method of cursors. Here is how I do it :

def fetchsome(cursor, some=1000):
    fetch=cursor.fetchmany
    while True:
        rows=fetch(some)
        if not rows: break
        for row in rows:
            yield row

This way you can "SELECT * FROM tbl_subscriber;" but you will only fetch some at a time.

dugres 2010-09-24 16:33:15

ansaurus

tags:

views:

answers:

Best way to iterate through all rows in a DB-table

related questions