views:

386

answers:

4

How can I make an ORDER BY clause with a small LIMIT (ie 20 rows at a time) return quickly, when I can't use an index to satisfy the ordering of rows?

Let's say I would like to retrieve a certain number of titles from a table 'node' (simplified below). I'm using MySQL by the way.

node_ID INT(11) NOT NULL auto_increment,
node_title VARCHAR(127) NOT NULL,
node_lastupdated INT(11) NOT NULL,
node_created INT(11) NOT NULL

But I need to limit the rows returned to only those a particular user has access to. Many users have access large numbers of nodes. I have this information pre-calculated in a big lookup table (an attempt to make things easier) where the primary key covers both columns and the presence of a row means that usergroup has access to that node:

viewpermission_nodeID INT(11) NOT NULL,
viewpermission_usergroupID INT(11) NOT NULL

My query therefore contains something like

FROM
  node
  INNER JOIN viewpermission ON
    viewpermission_nodeID=node_ID
    AND viewpermission_usergroupID IN (<...usergroups of current user...>)

... and I also use a GROUP BY or a DISTINCT so that a node is only returned once even if two of the user's 'usergroups' both have access to that node.

My problem is that there seems to be no way for an ORDER BY clause which sorts results by created or last updated date to use an index, because the rows being returned depend on values in the other viewpermission table.

Therefore MySQL would need to find all rows which match the criteria, then sort them all itself. If there are one million rows for a particular user, and we want to view, say, the latest 100 or rows 100-200 when ordered by last update, the DB would need to figure out which one million rows the user can see, sort this whole result set itself, before it can return those 100 rows, right?

Is there any creative way to get around this? I've been thinking along the lines of:

  • Somehow add dates into the viewpermission lookup table so that I can build an index containing the dates as well as the permissions. It's a possibility I guess.

Edit: Simplified question

Perhaps I can simplify the question by rewriting it like this:

Is there any way to rewrite this query or create an index for the following such that an index can be used to do the ordering (not just to select the rows)?

SELECT nodeid
FROM lookup
WHERE
  usergroup IN (2, 3)
GROUP BY
  nodeid

An index on (usergroup) allows the WHERE part to be satisfied by an index, but the GROUP BY forces a temporary table and filesort on those rows. An index on (nodeid) does nothing for me, because the WHERE clause needs an index with usergroup as its first column. An index on (usergroup, nodeid) forces a temporary table and filesort because the GROUP BY is not the first column of the index that can vary.

Any solutions?

A: 

Copy the value you are going to order by into to viewpermission table and add it to your index.

You could use a trigger to maintain that value from the other table.

bobwienholt
My research on the way ORDER BY is optimized tells me that because the usergroup selection is not a constant but an IN() range, it still won't be able to use the index for the ORDER BY.That is, "WHERE usergroup IN (...) ORDER BY sortorder" cannot use an index for sort. Is this true?
thomasrutter
A: 
select * from
(
select *
FROM  node  
INNER JOIN viewpermission 
ON    viewpermission_nodeID=node_ID    
AND viewpermission_usergroupID IN (<...usergroups of current user...>)
) a
order by a.node_lastupdated desc

The inner query gives you the filtered subset, which I understand is substantially smaller than the whole set. Only the smaller has to be sorted.

cdonner
This is a good solution, and roughly equivalent to what I am using now, but unfortunately I need this not to choke when the filtered subset is still very large. For this I believe the ORDER BY must use an index for ordering, not just for selecting the subset.
thomasrutter
A: 

MySQL has problems when you use GROUP BY and ORDER BY in the same query. That causes a filesort, and that's probably the biggest penalty for performance.

You can eliminate the need for a DISTINCT (or GROUP BY) by using a non-correlated subquery instead of a JOIN.

SELECT * FROM node
WHERE node_id IN (
  SELECT viewpermission_nodeID
  FROM viewpermission
  WHERE viewpermissiong_usergroupID IN ( <...usergroups...> )
)
ORDER BY node_lastupdated DESC
LIMIT 100;

There's no need to sort or do a DISTINCT on the subquery, since IN (1, 1, 2, 3) is the same as IN (1, 3, 2).

Note that MySQL can use only one index per table in a given query, so it'll try to make the best choice between an index on node_id and an index on node_lastupdated. It can't use both, and even if you made a compound index it wouldn't help in this case.

Remember to analyze different solutions with EXPLAIN.

Bill Karwin
I understand the ORDER BY in the outer select clause would still not be able to use an index here, because the WHERE clause uses an IN() range rather than a constant. If there are lots of ids returned by the inner select, that would still be slow. Will try more experimenting w/ EXPLAIN though.
thomasrutter
+1  A: 

Can I answer my own question?

I believe I have found that the only way to do what I describe is for my lookup table to have rows for every possible combination of usergroups a person may want to be a member of.

To pick a simplified example, instead of doing this:

SELECT id FROM ids WHERE groups IN(1,2) ORDER BY id

If you need to use the index both to select rows and to order them, you have to abstract that IN(1,2) so that it is constant rather than a range, ie:

SELECT id FROM ids WHERE grouplist='1,2' ORDER BY id

Of course instead of using the string '1,2' you could have a foreign key there, etc. The point being that you'd have to have a row not just for each group but for each combination of multiple groups.

So, there is my answer.

Anyway, for my application, I feel that maintaining a lookup for all possible combinations of usergroups for each node is not worth it. For my purposes, I predict that most nodes are visible to most users, so I feel that it is acceptable to simply to make the GROUP BY use the index, as the filtering doesn't need it so badly.

In other words, the approach I'll take for my original query may be something like:

SELECT
    <fields>
FROM
  node
  INNER JOIN viewpermission ON
    viewpermission_nodeID=node_ID
    AND viewpermission_usergroupID IN (<...usergroups of current user...>)
  FORCE INDEX(node_created_and_node_ID)
GROUP BY
  node_created, node_ID

GROUP BY can use an index if it starts at the left most column of the index and it is in the first non-const non-system table to be processed. The join then deals with the entire list (which is already ordered), and only those not visible to the current user (which will be a small proportion) are removed by the INNER JOIN.

thomasrutter
Yes you can answer your own question. You can even accept it after two days.
MatthieuF
@thomasrutter: I tested what you suggest in MySQL 5.0.75. It refuses to use the index unless I say "FORCE INDEX" instead of "USE INDEX". But then it does get rid of the filesort.
Bill Karwin
Thanks Bill, yes it looks like FORCE INDEX is necessary sometimes, it's a bit hackish and I was hoping that MySQL would see the benefit of using the index for sorting once tables grew large, but seems it needs us to show it the righteous path.
thomasrutter