views:

213

answers:

3

The following query is pretty simple. It selects the last 20 records from a messages table for use in a paging scenario. The first time this query is run, it takes from 15 to 30 seconds. Subsequent runs take less than a second (I expect some caching is involved). I am trying to determine why the first time takes so long.

Here's the query:

SELECT DISTINCT ID,List,`From`,Subject, UNIX_TIMESTAMP(MsgDate) AS FmtDate
FROM messages
WHERE List='general'
ORDER BY MsgDate
LIMIT 17290,20;

MySQL version: 4.0.26-log

Here's the table:

messages  CREATE TABLE `messages` (
  `ID` int(10) unsigned NOT NULL auto_increment,
  `List` varchar(10) NOT NULL default '',
  `MessageId` varchar(128) NOT NULL default '',
  `From` varchar(128) NOT NULL default '',
  `Subject` varchar(128) NOT NULL default '',
  `MsgDate` datetime NOT NULL default '0000-00-00 00:00:00',
  `TextBody` longtext NOT NULL,
  `HtmlBody` longtext NOT NULL,
  `Headers` text NOT NULL,
  `UserID` int(10) unsigned default NULL,
  PRIMARY KEY  (`ID`),
  UNIQUE KEY `List` (`List`,`MsgDate`,`MessageId`),
  KEY `From` (`From`),
  KEY `UserID` (`UserID`,`List`,`MsgDate`),
  KEY `MsgDate` (`MsgDate`),
  KEY `ListOnly` (`List`)
) TYPE=MyISAM ROW_FORMAT=DYNAMIC

Here's the explain:

table   type    possible_keys  key       key_len  ref       rows  Extra
------  ------  -------------  --------  -------  ------  ------  --------------------------------------------
m       ref     List,ListOnly  ListOnly  10       const    18002  Using where; Using temporary; Using filesort

Why is it using a filesort when I have indexes on all the relevant columns? I added the ListOnly index just to see if it would help. I had originally thought that the List index would handle both the list selection and the sorting on MsgDate, but it didn't. Now that I added the ListOnly index, that's the one it uses, but it still does a filesort on MsgDate, which is what I suspect is taking so long.

I tried using FORCE INDEX as follows:

SELECT DISTINCT ID,List,`From`,Subject, UNIX_TIMESTAMP(MsgDate) AS FmtDate
FROM messages
FORCE INDEX (List)
WHERE List='general'
ORDER BY MsgDate
LIMIT 17290,20;

This does seem to force MySQL to use the index, but it doesn't speed up the query at all.

Here's the explain for this query:

table   type    possible_keys  key     key_len  ref       rows  Extra                       
------  ------  -------------  ------  -------  ------  ------  ----------------------------
m       ref     List           List    10       const    18002  Using where; Using temporary

UPDATES:

I removed DISTINCT from the query. It didn't help performance at all.

I removed the UNIX_TIMESTAMP call. It also didn't affect performance.

I made a special case in my PHP code so that if I detect the user is looking at the last page of results, I add a WHERE clause that returns only the last 7 days of results:

SELECT m.ID,List,From,Subject,MsgDate
FROM messages
WHERE MsgDate>='2009-11-15'
ORDER BY MsgDate DESC
LIMIT 20

This is a lot faster. However, as soon as I navigate to another page of results, it must use the old SQL and takes a very long time to execute. I can't think of a practical, realistic way to do this for all pages. Also, doing this special case makes my PHP code more complex.

Strangely, only the first time the original query is run takes a long time. Subsequent runs of either the same query or a query showing a different page of results (i.e., only the LIMIT clause changes) are very fast. The query slows down again if it has not been run for about 5 minutes.

SOLUTION:

The best solution I came up with is based on Jason Orendorff and Juliet's idea.

First, I determine if the current page is closer to the beginning or end of the total number of pages. If it's closer to the end, I use ORDER BY MsgDate DESC, apply an appropriate limit, then reverse the order of the returned records.

This makes retrieving pages close to the beginning or end of the resultset much faster (first time now takes 4-5 seconds instead of 15-30). If the user wants to navigate to a page near the middle (currently around the 430th page), then the speed might drop back down. But that would be a rare case.

So while there seems to be no perfect solution, this is much better than it was for most cases.

Thank you, Jason and Juliet.

+1  A: 

What version of my SQL are you using? Some of the older versions used the LIMIT clause as a post-process filter (meaning get all the record requested from the server, but only display the 20 you requested back).

You can see from your explain, 18002 rows are coming back, even though you are only showing 20 of them. Is there any way to adjust your selection criteria to identify the 20 rows you want to return, rather than getting 18000 rows back and only showing 20 of them???

Sparky
4.0.26: definitely an older version of MySQL than I'd like to be using today.
bobince
That makes sense, your first query is returning 18002 rows, even though it only shows you 20 of them. (Your explain shows this). It looks like other people have offered solutions along the same line, try to get back few rows via the WHERE clause rather than relying on the LIMIT option. If you are looking for the newest records, you might want to grab the max msgdate and then use that and maybe the prior date in your where clause. Even though that is two queries, one to get the max date and one for the data, it should run faster. Good luck
Sparky
I made a special case in my PHP code so that if I detect the user is looking at the last page of results, I add a WHERE clause that returns only the last 7 days of results: SELECT m.ID,List,`From`,Subject,MsgDate FROM messages m WHERE MsgDate>='2009-11-15' ORDER BY MsgDate DESC LIMIT 20This is a lot faster. However, as soon as I navigate to another page of results, it uses the old SQL and takes forever to return. I can't think of a practical, realistic way to do this for all pages.
elmonty
Also... my PHP code is getting ugly checking for special cases.
elmonty
+3  A: 

Instead of ORDER BY MsgDate LIMIT 17290,20, try ORDER BY MsgDate DESC LIMIT 20.

Of course the results will come out in the reverse order, but that should be easy to deal with.

EDIT: Do your MessageId values always increase with time? Are they unique?

If so, I would make an index:

UNIQUE KEY `ListMsgId` ( `List`, `MessageId` )

and query based on the message ids rather than the date when possible.

-- Most recent messages (in reverse order)
SELECT * FROM messages
WHERE List = 'general'
ORDER BY MessageId DESC
LIMIT 20

-- Previous page (in reverse order)
SELECT * FROM messages
WHERE List = 'general' AND MessageId < '15885830'
ORDER BY MessageId DESC
LIMIT 20

-- Next page
SELECT * FROM messages
WHERE List = 'general' AND MessageId > '15885829'
ORDER BY MessageId
LIMIT 20

I think you're also paying for having varchar columns where an int type would be a lot faster. For example, List could instead be a ListId that points to an entry in a separate table. You might want to try it out in a test database to see if that's really true; I'm not a MySQL expert.

Jason Orendorff
I use the LIMIT because the user can page back to the previous 20 messages (or the 20 before that, etc.).
elmonty
I tried this but it doesn't help at all. It's probably due to the fact that LIMIT has the same problems whether or not an offset is used.
elmonty
Wow. This is kind of bizarre. The simplest possible form of this query would be `SELECT * FROM messages WHERE List='general' ORDER BY MsgDate` which I would expect to be super fast since you have an index on (List, MsgDate, MessageId).
Jason Orendorff
I already tried this: SELECT * FROM messages WHERE List = 'general' ORDER BY MsgDate DESC LIMIT 20It isn't any faster because the entire rowset is retrieved before the LIMIT is processed. Ordering on MessageID wouldn't be any better. BTW, MessageID is an email MessageID header, so it's not going to be sequential (although ID is).
elmonty
I wonder if the large HtmlBody and TextBody columns are part of the problem, even though they are not being retrieved here.
elmonty
Addendum: Although ID is sequential, it is not necessarily contiguous (because messages can be deleted).
elmonty
+2  A: 

You can drop the ListOnly key. The compound index List already contains all the information in it.

Your EXPLAIN for the List-indexed query looks much better, lacking the filesort. You may be able to get better real performance out of it by swapping the ORDER as suggested by Jason, and maybe losing the UNIX_TIMESTAMP call (you can do that in the application layer, or just use Unix timestamps stored as INTEGER in the schema).

bobince
Will losing the UNIX_TIMESTAMP actually affect performance that much?
elmonty
Don't know really, can only test it and see. I would hope not, but often using calculated columns can cause MySQL to use temporary tables where otherwise it might not need to. Personally I only ever use unix timestamps for dates, as the native date types tend to cause access-layer incompatibilities across DBMSs and offer relatively little utility over plain timestamps.
bobince
I tried removing UNIX_TIMESTAMP. It did not help performance at all.
elmonty