views:

99

answers:

5

Is there a faster way to update the oldest row of a MySQL table that matches a certain condition than using ORDER BY id LIMIT 1 as in the following query?

UPDATE mytable SET field1 = '1' WHERE field1 = 0 ORDER BY id LIMIT 1;

Note:

  • Assume the primary key is id and there is also a index on field1.
  • We are updating a single row.
  • We are not updating strictly the oldest row, we are updating the oldest row that matches a condition.
  • We want to update the oldest matching row, i.e the lowest id, i.e. the head of the FIFO queue.

Questions:

  • Is the ORDER BY id necessary? How does MySQL order by default?

Real world example

We have a DB table being used for a email queue. Rows are added when we want to queue emails to send to our users. Rows are removed by a cron job, run each minute, processing as many as possible in that minute and sending 1 email per row.

We plan to ditch this approach and use something like Gearman or Resque to process our email queue. But in the meantime I have a question on how we can efficiently mark the oldest item of the queue for processing, a.k.a. The row with the lowest ID. This query does the job:

mysql_query("UPDATE email_queue SET processingID = '1' WHERE processingID = 0 ORDER BY id LIMIT 1");

However, it is appearing in the mysql slow log a lot due to scaling issues. The query can take more than 10s when the table has 500,000 rows. The problem is that this table has grown massively since it was first introduced and now sometimes has half a million rows and a overhead of 133.9 MiB. For example we INSERT 6000 new rows perhaps 180 times a day and DELETE roughly the same number.

To stop the query appearing in the slow log we removed the ORDER BY id to stop a massive sort of the whole table. i.e.

mysql_query("UPDATE email_queue SET processingID = '1' WHERE processingID = 0 LIMIT 1");

... but the new query no longer always gets the row with the lowest id (although it often does). Is there a more efficient way of getting the row with the lowest id other than using ORDER BY id ?

For reference, this is the structure of the email queue table:

CREATE TABLE IF NOT EXISTS `email_queue` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `time_queued` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'Time when item was queued',
  `mem_id` int(10) NOT NULL,
  `email` varchar(150) NOT NULL,
  `processingID` int(2) NOT NULL COMMENT 'Indicate if row is being processed',
  PRIMARY KEY (`id`),
  KEY `processingID` (`processingID`)
) ENGINE=MyISAM  DEFAULT CHARSET=latin1;
+1  A: 

sounds like you have other processes locking the table preventing your update completing in a timely manner - have you considered using innodb ?

f00
Yes, this seems very likely. Would innodb solve this issue?
Tom
innodb uses row level locking whereas myisam uses table level locks. the number of rows in your table is pretty small - i regularly work with 1 billion+ rows using innodb and it's uber performant !!
f00
+2  A: 

Give this a read:

shamittomar
+1  A: 

I think the 'slow part' comes from

WHERE processingID = 0 

It's slow because it's not indexed. But, indexing this column (IMHO) seems incorrect too. The idea is to change above query to something like :

WHERE id = 0 

Which theoretically will be faster since it uses index.

How about creating another table which contains ids of rows which hasn't been processed? Hence the insertion works twice. First to insert to the real table and the second is to insert id into 'table of hasn't processed'. The processing part too, needs to double its duty. First to retrieve an id from 'table of hasn't been processed' then delete it. The second job of processing part is to process of course.

Of course, the id column in 'table of hasn't been processed' needs to index its content. Just to ensure that selecting and deleting will be faster.

jancrot
Thanks for your comment. But I think I do have a index on processingID: `KEY processingID (processingID)` ?
Tom
A: 

One funny thing is that MySQL, by default, returns rows orderd by ID, instead in a casual way as stated in the relational theory (I am not sure if this behaviour is changed in the latest versions). So, the last row you get from a select should be the last inserted row. I would not use this way, of course.

As you said, the best solution is to use something like Resque, or RabbitMQ & co.

You could use an in-memory table, that is volatile, but much faster, than store, there the latest ID, or just use a my_isam table to add persistency. It is simple and fast in performance and it takes a little bit to implement.

rtacconi
Thanks, interesting info. The default ordering was what we were betting on when we removed `ORDER BY id`. However, in practice we found that MySQL didn't order by id by default all the time - although we didn't verify this scientifically so my info isn't 100% accurate.
Tom
A: 

I think u can do this

limit table rows (eg: 10000) when it reach limit create new table

email_queue_1
email_queue_2
email_queue_3 ....

create one master table "email_queue_group" and then there will not any performance issue.

yogs