views:

2130

answers:

5

I have to insert 8000+ records into a SQLite database using Django's ORM. This operation needs to be run as a cronjob about once per minute.
At the moment I'm using a for loop to iterate through all the items and then insert them one by one.
Example:

for item in items:
    entry = Entry(a1=item.a1, a2=item.a2)
    entry.save()

What is an efficent way of doing this?


Edit: A little comparison between the two insertion methods.

Without commit_manually decorator (11245 records):

nox@noxdevel marinetraffic]$ time python manage.py insrec             

real    1m50.288s
user    0m6.710s
sys     0m23.445s

Using commit_manually decorator (11245 records):

[nox@noxdevel marinetraffic]$ time python manage.py insrec                

real    0m18.464s
user    0m5.433s
sys     0m10.163s

Note: The test script also does some other operations besides inserting into the database (downloads a ZIP file, extracts an XML file from the ZIP archive, parses the XML file) so the time needed for execution does not necessarily represent the time needed to insert the records.

+3  A: 

Have a look at this. It's meant for use out-of-the-box with MySQL only, but there are pointers on what to do for other databases.

Vinay Sajip
+3  A: 

You might be better off bulk-loading the items - prepare a file and use a bulk load tool. This will be vastly more efficient than 8000 individual inserts.

ConcernedOfTunbridgeWells
+12  A: 

You want to check out "django.db.transaction.commit_manually".

http://docs.djangoproject.com/en/dev/topics/db/transactions/#django-db-transaction-commit-manually

So it would be something like:

from django.db import transaction

@transaction.commit_manually
def viewfunc(request):
    ...
    for item in items:
        entry = Entry(a1=item.a1, a2=item.a2)
        entry.save()
    transaction.commit()

Which will only commit once, instead at each save().

monkut
This will instantiate them all as models, and run thousands of individual inserts. I've always had to drop to SQL and do manual batch inserts for this type of volume; Django isn't built for it. But yes, you definitely want a single transaction if you're doing it this way.
Glenn Maynard
I don't know the Django ORM that well, but doesn't the ORM just generate the SQL for you? And in a simple model with no foreign keys, doesn't a single instance translate to a single insert statement?
monkut
Hi could you please elaborate the same in terms of .net? It would be a great help , as i am facing the same situation
Amit Ranjan
I don't have .net experience, but speaking from a general Database perspective, turn off AUTOCOMMIT and encapsulating INSERT statements between BEGIN/END TRANSACTION statements will be faster than using AUTOCOMMIT and running INSERTS alone. Note, these commands and how they are used can change based on the DB your using. If you want a .net or .net framework specific answer go ahead and start a new question.
monkut
+1  A: 

I recommend using plain SQL (not ORM) you can insert multiple rows with a single insert:

insert into A select from B;

The select from B portion of your sql could be as complicated as you want it to get as long as the results match the columns in table A and there are no constraint conflicts.

A: 

I've ran into the same problem and I can't figure out a way to do it without so many inserts. I agree that using transactions is probably the right way to solve it, but here is my hack:

 def viewfunc(request):
     ...
     to_save = [];
     for item in items:
         entry = Entry(a1=item.a1, a2=item.a2)
         to_save.append(entry);
     map(lambda x: x.save(), to_save);
speedplane