views:

829

answers:

4

Is there a more efficent way for doing this?

for item in item_list:
    e, new = Entry.objects.get_or_create(
        field1 = item.field1,
        field2 = item.field2,
    )
+2  A: 

Depends on what you are aiming at. You can use manage.py's loaddata function to load data in a appropriate format (JSON, XML, YAML,...).

See also this discussion.

Felix Kling
A: 

I'd say there isn't.

But I wonder what type your items are, if they have field1 and field2 as attributes. Looks like there exists another class representing an entry but is not derived from models.Model. Maybe you can omit this class and create Entry instances immediately instead of creating those items.

jellybean
+1  A: 

If you're not sure whether the things in your item_list already exist in your DB, and you need the model objects, then get_or_create is definitely the way to go.

If you know the items are NOT in your DB, you would be much better doing:

for item in item_list:
    new = Entry.objects.create(
        field1 = item.field1,
        field2 = item.field2,
    )

And if you don't need the objects, then just ignore the return from the function call. It won't speed the DB stuff, but it will help with memory management if that's an issue.

If you're not sure whether the data is already in the DB, but either field has a unique=True flag on it, then the DB will enforce the uniqueness, and you can just catch the exception and move on. This will prevent an extra DB hit by avoiding the attempt to select the existing object.

from django.db import IntegrityError

for item in item_list:
    try:
        new = Entry.objects.create(
            field1 = item.field1,
            field2 = item.field2,
        )
    except IntegrityError:
        continue

You could increase speed in either case by manually managing the transactions. Django will automatically create and commit a transaction for every save, but provides some decorators that will greatly increase efficiency if you know you'll be doing a lot of DB saves in a particular function. The Django docs do a better job of explaining all of this than I can here, but you'll probably want to pay particular attention to django.db.transaction.commit_on_success

Paul McLanahan
items in my item_list *may* already exist in my DB, and yes I need the model objects. And none of the fields have a unique=True constraint :'( So I think get_or_create is the way to go. Let's hit the database!
kemar
This doesn't answer the question. First, he said he wanted to *bulk* insert; "get_or_create is the way to go" doesn't help, since get_or_create doesn't do bulk inserts. Inserting items one at a time is the wrong thing to do for bulk inserting. Finally, you can't just cause errors and then ignore it; in Postgresql you'll end up with "transaction aborted" errors unless you jump checkpoint hoops.
Glenn Maynard
+2  A: 

You can't do decent bulk insertions with get_or_create (or even create), and there's no API for doing this easily.

If your table is simple enough that creating rows with raw SQL isn't too much of a pain, it's not too hard; something like:

INSERT INTO site_entry (field1, field2)
(
         SELECT i.field1, i.field2
         FROM (VALUES %s) AS i(field1, field2)
         LEFT JOIN site_entry as existing
                 ON (existing.field1 = i.field1 AND existing.field2 = i.field2)
         WHERE existing.id IS NULL
)

where %s is a string like ("field1, field2"), ("field3, field4"), ("field5, field6") that you'll have to create and escape properly yourself.

Glenn Maynard
OK thank you, you show me the way :)I used this sql query:` INSERT INTO table (field1, field2, field3, field4)`` SELECT 'value1', 'value2', 'value3', 'value4' FROM DUAL`` WHERE NOT EXISTS (`` SELECT id FROM table`` WHERE field1 = 'value1'`` AND field2 = 'value1'`` LIMIT 1`` );`combined with cursor.executemany
kemar
The way I did it, you can batch insert many (hundreds) of rows in a single query, which is how you usually want to do batch inserts.
Glenn Maynard
Yep, but I can't do it with MySQL. Seems I can't find the correct SQL syntax with your proposed query.
kemar