ansaurus

Question

How to update 400k rows on a production MySQL db and don't kill it

Answer 1

+1 A:

Hi,

What I recommend.

Add an isProcessed column to your table.
Make your script work on a chunk of, say, 1k rows for the first run (of course select only rows that are not processed).
Benchmark it.
Adjust the chunk size if needed.
Build another script that calls this one at intervals.

Don't forget to add some sleep time in both your scripts!

This will work if your change does not need to be continuous (and I don't think it has to be). If you have to do it all at once you should put your database offline during the time the script runs.

Alin Purcaru 2010-10-19 07:52:54

I'm not sure about adding new column, because altering 4GB table will take a lot of time.

sergeik 2010-10-19 07:57:37

I doubt MySQL is that stupid. If you don't want that you can make a new temporary table with just `id` and `isProcessed` that you join with your original. The idea is that you should have a marker for the processed rows so you don't run the script twice on the same ones. Users may add more in the meantime or the script may fail and you will need to rerun it.

Alin Purcaru 2010-10-19 08:05:21

In my case script process rows from bottom to top thats why I will not run script twice on the same ones.

sergeik 2010-10-19 08:43:58

Well, if you want to use my proposed solution you'll have to use a marker because it does not do all the processing in one go. Also your argument does not stand if the script fails somewhere in the middle.

Alin Purcaru 2010-10-19 08:47:16

Answer 2

+1 A:

i will Add:

Why you create in each loop a new connexion and close it heh !!!

And maybe you can use db.autocommit(False) specially for the UPDATE and do a db.commit() for each 100 rows or something ;

and like Alin Purcaru you should do some benchmark as well.

Hope this can help :)

singularity 2010-10-19 08:43:09

Thats why I'm asking this question=) We have some triggers on update and updating many rows could be a bottleneck.

sergeik 2010-10-19 11:17:22

@sergeik: ok i understand now so forget about the autocommmit to false thing ; but create one connection to the database you don't have to repeat it each time; and do like S.Lott told you each time check if i proceed or pass (continue) and remember : "premature optimization is the root of all evil" Knuth ; unless you need to score a new record for the Guinness book and do a backup before we never know :)

singularity 2010-10-19 11:49:21

@sergeik: "We have some triggers on update". Please **update** your question to include **all** the relevant facts. Please.

S.Lott 2010-10-19 11:56:18

@singularity: "but create one connection to the database you don't have to repeat it each time" -- I know this, but don't understand the pros and cons of "persistent connection". OK, I can save some time on not repeating connection but maybe this long running connection eats memory or something.

sergeik 2010-10-19 12:29:47

@sergeik : it's call database connection pooling ; and it very effective, because if you create each time a new connection so the script has to create each time a new instance Connection (more Memory) and the DBMS is also reserving a new connection (new thread ..) for this , so what do you think which is better :) .

singularity 2010-10-19 13:20:25

@singularity: I'm running update with original script=) And you know memory consumption by python process is the same all time long.

sergeik 2010-10-19 13:42:09

It took about 5 hours. Very fast in the beginning and very slow in the end...

sergeik 2010-10-20 07:01:20

Answer 3

+1 A:

    db_data = db.query('''
        SELECT id AS news_id, image AS src_filename
        FROM emd_news
        ORDER BY id ASC
        LIMIT %s, %s''', offset, LIMIT_ROW_COUNT)
     # Why is there any code here at all?  If there's no data, why proceed?
     if not db_data: break

S.Lott 2010-10-19 10:21:43

You are right! Thanks.

sergeik 2010-10-19 11:14:33

ansaurus

tags:

views:

answers:

How to update 400k rows on a production MySQL db and don't kill it

related questions