ansaurus

Question

Answer 1

+2 A:

You might want to look into executemany(): Information here

VeeArr 2010-06-03 17:47:20

Answer 2

+3 A:

First of all, do not use the % operator to construct your SQL. Instead, pass your tuple of arguments as the second parameter to cursor.execute, which also negates the need to quote your argument and allows you to use %s for everything:

cursor.execute("UPDATE jiveuser SET username = %s WHERE userid = %s", (newName, userId))

This is important to prevent SQL Injection attacks.

To answer your question, you can speed up these updates by creating an index on the userid column, which will allow the database to update in O(1) constant time rather than having to scan the entire database table, which is O(n). Since you're using PostgreSQL, here's the syntax to create your index:

CREATE INDEX username_lookup ON jiveuser (userid);

EDIT: Since your comment reveals that you already have an index on the userid column, there's not much you could possibly do to speed up that query. So your main choices are either living with the slowness, since this sounds like a one-time fix-something-broken thing, or following VeeArr's advice and testing whether cursor.executemany will give you a sufficient boost.

Eli Courtwright 2010-06-03 17:49:54

Well I have an index for the userid which is what is in the where clause... would an index on the username really help?

hdx 2010-06-03 17:54:35

@hdx: No, that was my mistake, I carelessly typed `username` rather than `userid`, and I didn't realize you already had an index on that column. I have just corrected my typo.

Eli Courtwright 2010-06-03 18:13:43

Is it `%s` the placeholder per the Postgres DBAPI? I don't know, I'm asking, since the placeholder is typically `?`.

ΤΖΩΤΖΙΟΥ 2010-06-04 00:41:21

@TZ: `%s` is the DB-API 2.0 standard placeholder; implementations may use placeholders such as `?` as well as `%s`, but my understanding is that `%s` is required to work for an implementation to be considered DB-API 2.0 compliant.

Eli Courtwright 2010-06-04 02:34:49

Answer 3

+1 A:

Perhaps you can create an index on userid to speed things up.

anijhaw 2010-06-03 17:54:14

Answer 4

A:

Move this to a stored procedure and execute it from the database self.

Jorge Niedbalski R. 2010-06-03 18:40:04

well, by the time you've put the data onto the db in a temp table or similar, you don't need a server function.

araqnid 2010-06-03 20:22:40

Answer 5

+4 A:

Insert all the data into another empty table (called userchanges, say) then UPDATE in a single batch:

UPDATE jiveuser
SET username = userchanges.username
FROM userchanges
WHERE userchanges.userid = jiveuser.userid
    AND userchanges.username <> jiveuser.username

See this documentation on the COPY command for bulk loading your data.

There are also tips for improving performance when populating a database.

Cade Roux 2010-06-03 18:44:09

Make that a temporary table and you don't have to bother about it after you're done.

Frank Heikens 2010-06-03 19:38:58

@Cade Roux: I updated your statement a bit as PostgreSQL's UPDATE...FROM construct differs a little from what you wrote (but close enough that I only noticed it on the second pass...)

araqnid 2010-06-03 20:40:28

Also, look into using COPY to push the data to the server, although it will need some driver-specific support to do that (usually called putcopydata or similar)

araqnid 2010-06-03 20:45:29

So in order to create that temp table I'll need to run the same number of insert statements as I was running update statements before... Are inserts that much faster than updates done based on an indexed field?

hdx 2010-06-03 21:22:20

@hdx On SQL Server, we would use a BULK INSERT / SqlBulkCopy to commit 9500 rows in a single insert. The UPDATE should commit 9500 changes in a single commit.

Cade Roux 2010-06-03 21:34:42

I'd be interested to see a benchmark on this approach. I suspect that doing 9500 inserts into a new table followed by a bulk update would take longer than 9500 updates. After all, all data would then have to be processed twice.

Jay 2010-06-03 21:50:36

@Jay, Like I said, do one insert of 9500 rows and one update of 9500 rows. Fewer large set operations are generally faster on RDBMS than many small set operations.

Cade Roux 2010-06-03 21:59:03

@Cade: Oh, oh, you're assuming that your SQL dialect supports insert of multiple records with a single statement. Some don't, so I was thinking you would need multiple insert statements, and at that point it would become a question of the relative performance of a bunch of inserts followed by a mass update versus a bunch of updates.

Jay 2010-06-07 16:37:45

@Jay Postgres supports COPY as well as disabling AUTOCOMMIT, but typically an entire batch operation needs a single transaction to ensure integrity. This question does not give the motivation as to why the database issues are needing fixing nor whether this needs to happen on a regular basis. If it didn't have to happen on a regular basis, I would argue that performance doesn't even matter and just "run and done". For production, get to the point (which may indeed take a while) where the database can quickly do the part which needs ACID, in order to reduce effective locking time.

Cade Roux 2010-06-07 18:29:19

Answer 6

+1 A:

I'd do an explain on this. If it's doing an indexed lookup to find the record -- which it should if you have an index on userid -- then I don't see what you could do to improve performance. If it's not using the index, then the trick is figuring out why not and fixing it.

Oh, you could try using a prepared statement. With 9500 inserts, that should help.

Jay 2010-06-03 21:54:13

Answer 7

+3 A:

The reason it's taking so long is probably that you've got autocommit enabled and each update gets done in its own transaction.

This is slow because even if you have a battery-backed raid controller (which you should definitely have on all database servers, of course), it still needs to do a write into that device for every transaction commit to ensure durability.

The solution is to do more than one row per transaction. But don't make transactions TOO big or you run into problems too. Try committing every 10,000 rows of changes as a rough guess.

MarkR 2010-06-03 22:09:22

Answer 8

A:

First ensure you have an index on 'userid', this will ensure the dbms doesn't have to do a table scan each time

CREATE INDEX jiveuser_userid ON jiveuser (userid);

Next try preparing the statement, and then calling execute on it. This will stop the optimizer from having to examine the query each time

PREPARE update_username(string,integer) AS UPDATE jiveuser SET username = $1 WHERE userid = $2;
EXECUTE update_username("New Name", 123);

Finally, a bit more performance could be squeezed out by turning off autocommit

\set autocommit off

W Devauld 2010-06-06 20:14:37

ansaurus

tags:

views:

answers:

Database query optimization

related questions