views:

140

answers:

2

I've got a very large SQLite table with over 500,000 rows with about 15 columns (mostly floats). I'm wanting to transfer data from the SQLite DB to a Django app (which could be backed by many RDBMs, but Postgres in my case). Everything works OK, but as the iteration continues, memory usage jumps by 2-3 meg a second for the Python process. I've tried using 'del' to delete the EVEMapDenormalize and row objects at the end of each iteration, but the bloat continues. Here's an excerpt, any ideas?

class Importer_mapDenormalize(SQLImporter):
def run_importer(self, conn):
    c = conn.cursor()

    for row in c.execute('select * from mapDenormalize'):
        mapdenorm, created = EVEMapDenormalize.objects.get_or_create(id=row['itemID'])
        mapdenorm.x = row['x']
        mapdenorm.y = row['y']
        mapdenorm.z = row['z']

        if row['typeID']:
            mapdenorm.type = EVEInventoryType.objects.get(id=row['typeID'])

        if row['groupID']:
            mapdenorm.group = EVEInventoryGroup.objects.get(id=row['groupID'])

        if row['solarSystemID']:
            mapdenorm.solar_system = EVESolarSystem.objects.get(id=row['solarSystemID'])

        if row['constellationID']:
            mapdenorm.constellation = EVEConstellation.objects.get(id=row['constellationID'])

        if row['regionID']:
            mapdenorm.region = EVERegion.objects.get(id=row['regionID'])

        mapdenorm.save()
    c.close()

I'm not at all interested in wrapping this SQLite DB with the Django ORM. I'd just really like to figure out how to get the data transferred without sucking all of my RAM.

+1  A: 

I think a select * from mapDenormalize and loading the result into memory will always be a bad idea. My advise is - spread script into chunks. Use LIMIT to get data in portions.

Get first portion, work with it, close to cursor, and then get the next portion.

Silver Light
I thought this at first as well, but chunking it up still lead to memory leakage. Memory usage remains unchanged from the situation outlined above.
Greg
+3  A: 

Silly me, this was addressed in the Django FAQ.

Needed to clear the DB query cache while in DEBUG mode.

from django import db 
db.reset_queries()
Greg
Even though the query cache + DEBUG=True bit you this time, you might find it useful in a future round of debugging and/or optimization. The queries are cached in `django.db.connection.queries`.
istruble