ansaurus

Question

A very interesting MYSQL problem (related to indexing, million records, algorithm.)

Answer 1

+2 A:

I don't think the index on enabled is doing you any good, the cardinality is too low. Remove that and your UPDATEs should go faster.

I am not sure what you mean when you say each record takes 3 seconds since, you are handling them in batches of 200. How are you determining this and what other processing is involved?

RedFilter 2010-05-03 17:39:35

I will fetch 200 data each time and do call some REST API. The round trip time for the REST API is around 3 seconds.Sometimes, the API will return error and I have to set enabled to 0 and that item will not get fetched anymore.

2010-05-04 20:12:49

@terence410: It sounds like the problem has nothing to do with SQL or PHP - it is the REST API that is slow. If that is your code, then you are in luck, you can try to improve its performance. If not, there is not much you can do.

RedFilter 2010-05-04 20:40:58

Answer 2

+1 A:

You could try running this before the update:

ALTER TABLE items DISABLE KEYS;

and then when you're done updating,

ALTER TABLE items ENABLE KEYS;

That should recreate the index much faster than updating each record at a time will.

cHao 2010-05-03 17:40:43

ENABLE KEYS on a big table can take a long time, perhaps not a great idea for updating a small number of rows.

David M 2010-05-03 18:07:08

Maybe. But then, you're updating the index (perhaps in a big way) each and every time you update a row. Better to do it all at once, methinks, than to say "move these 500000 index entries" each and every time you do an update.

cHao 2010-05-03 18:18:09

Answer 3

+3 A:

There are two points that I can think of that should help:

a. unix_timestamp(now()) - 86400)

... this will evaluate now() for every single row, make it a constant by setting a variable to that value before each run.

b. Indexes help reads but can slow down writes

Consider dropping indexes before updating (DISABLE KEYS) - and then re-add them before reading (ENABLE KEYS).

amelvin 2010-05-03 17:43:34

Answer 4

+1 A:

You could do this:

dispatcher.php: Manages the whole process.
- fetches items in convenient packages from the database
- calls worker.php on the same server with an HTTP post containing all UIDs fetched (I understand that worker.php would not need more than the UID to do its job)
- maintains a counter of how many worker.php scrips are running. When one is started, the counter increments until a certain limit, when one worker returns then the counter is decremented. See "Asynchronous PHP calls?".
- repeats until all records are fetched once. Maintain a MySQL LIMIT counter and do not work with updated.
worker.php: does the actual work
- does its thing with each item posted.
- writes to a helper table the ID of each item it has processed (no index on that table)
dispatcher.php: housekeping.
- once all workers have returned, updates the main table with the helper table in a single statement
error recovery
- since worker.php would update the helper table after each item done, you can use the state of the helper table to recover from a crash. Saving the "work package" of each individual worker before it starts running would help to recover worker states as well.

You would have a multi-threaded processing chain this way and could even distribute the whole thing across multiple machines.

Tomalak 2010-05-03 17:46:58

P.S.: If doing asynchronous work turns out as too difficult in PHP, you could implement the dispatcher in a more convenient language. The above is just an idea, I don't know if it is feasible. Maybe a few modifications are necessary to accommodate PHPs lack of native threads.

Tomalak 2010-05-03 18:06:00

Answer 5

A:

For a table with fewer than a couple of billion records, the primary key should be an unsigned int rather than a bigint.

2010-05-03 17:59:16

Answer 6

A:

One idea:

Use a HANDLER, that will improve your performance considerably:

http://dev.mysql.com/doc/refman/5.1/en/handler.html

David M 2010-05-03 18:05:48

Thanks, let me try on that.

2010-05-04 20:09:40

ansaurus

tags:

views:

answers:

A very interesting MYSQL problem (related to indexing, million records, algorithm.)

related questions