views:

110

answers:

4

Lets say I have a database table which consists of three columns: id, field1 and field2. This table may have anywhere between 100 and 100,000 rows in it. I have a python script that should insert 10-1,000 new rows into this table. However, if the new field1 already exists in the table, it should do an UPDATE, not an INSERT.

Which of the following approaches is more efficient?

  1. Do a SELECT field1 FROM table (field1 is unique) and store that in a list. Then, for each new row, use list.count() to determine whether to INSERT or UPDATE
  2. For each row, run two queries. Firstly, SELECT count(*) FROM table WHERE field1="foo" then either the INSERT or UPDATE.

In other words, is it more efficient to perform n+1 queries and search a list, or 2n queries and get sqlite to search?

A: 

I imagine using a python dictionary would allow for much faster searching than using a python list. (Just set the values to 0, you won't need them, and hopefully a '0' stores compactly.)

As for the larger question, I'm curious too. :)

sarnold
Or just store them in a `set` instead of a `dict`...
tzaman
Thanks tzaman, didn't realize sets were fast :)
sarnold
+8  A: 

If I understand your question correctly, it seems like you could simply use SQLite's built in conflict handling mechanism.

Assuming you have a UNIQUE constraint on field1, you could simple use:

INSERT OR REPLACE INTO table VALUES (...)

The following syntax is also supported (identical semantics):

REPLACE INTO table VALUES (...)

EDIT: I realise that I am not really answering your question, just providing an alternative solution which should be faster.

Nick
+1, I was just about to post this myself.
tzaman
Cool, thanks! Virtual +1, I'm out of votes today. Sigh :)
sarnold
Great answer! Great functionality. A shame it's not in the SQL-92 specification, so that *all* RLDB's implement this :)
Wez
Thanks Nick, this is a good solution. As with all good solutions, it has made me realise that I wasn't quite asking the right question.
Andrew Ho
+1  A: 

I'm not familiar with sqlite but a general approach like this should work:

If there's a unique index on field1 and you're trying to insert a value that's already there you should get an error. If insert fails, you go with the update.

Pseudocode:

try
{
    insert into table (value1, value2)
}
catch(insert fails)
{
    update table set field2=value2 where field1=value1
}
Dan Stocker
A: 

You appear to be comparing apples with oranges.

A python list is only useful if your data fit into the address-space of the process. Once the data get big, this won't work any more.

Moreover, a python list is not indexed - for that you should use a dictionary.

Finally, a python list is non-persistent - it is forgotten when the process quits.

How can you possibly compare these?

MarkR