views:

150

answers:

1

Hi,

I'm implementing the following access policy: a User can access a Resource if he created it, belongs to the Resource's group members or if the resource is publicly available.

Here is my DB structure (tables are MyISAM):

User (1K-10K Users)
  id
  nickame
  …
  index user_name(id, nickname)
Group (1K)
  id
  …
Resource (10K-100K)
  id
  user_id
  access_type (int)
  group_id
  created (int/timestamp)
  …
  indexes by_user(user_id, created), by_group(access_type, group_id, created), public(access_type, created)
User_Group (5K-20K)
  user_id
  group_id
  membership_type (int)
  index user_group(user_id, membership_type, group_id)

The conditions to grant access are implemented this way: * The User is the creator of the Resource (user_id = <his id>, whatever access_type is) * Or the ressource is publicly available: Resource.access_id = 3 * Or the ressource is shared in a group and the User is accepted as "member" of this group: ** Resource.access_id = 2 (shared in a group) ** Ressource.group_id matches a User_Group's group_id ** this User_Group's user_id matches the User's id ** this User_Group's membership_type is 1, 2 or 3 (accepted member or group moderator/admin)

My question is: what is the most efficient way to list the Resources (and the Resource creator's nickname) that are accessible to the User when we have his ID. And ordered by the Resource's created timestamp.

My best attempt so far is to use an UNION that allows using both by_user and by_group indexes, but might struggle in the sorting afterwards:

SELECT SQL_NO_CACHE
  a.*, user.nickname
FROM (
  ( SELECT resource.* FROM resource WHERE user_id = '000000-0000-0000-0000-000000000000' )
UNION
  ( SELECT resource.* FROM resource WHERE access_type = 3 )
UNION
  ( SELECT resource.* FROM resource
   INNER JOIN user_group ON (access_type = 2 AND user_group.group_id = resource.group_id)
   WHERE user_group.user_id = '000000-0000-0000-0000-000000000000'
    AND user_group.type IN (1, 2, 3)
  )
) a
LEFT JOIN user ON (user.id = a.user_id)
ORDER BY created DESC;

The EXPLAIN output tells me it uses the indexes for both SELECTs and finishes with a type: ALL / Using filesort for the ORDER BY on UNION-ed rows. But if I want to order and limit, this can get heavy, I guess, since there is no index on UNION-ed tuples.

The other possibility is a simpler query with a subquery for the Groups the User is in:

SELECT SQL_NO_CACHE
  resource.*, user.nickname
FROM resource
LEFT JOIN user ON (user.id = resource.user_id) 
WHERE user_id = '000000-0000-0000-0000-000000000000'
  OR access_type = 3
  OR (access_type = 2 AND group_id IN
   ( SELECT group_id FROM user_group USE INDEX (`user_group`)
   WHERE user_id = '000000-0000-0000-0000-000000000000' AND type IN (1, 2, 3)
   )
  )
ORDER BY created DESC;

But here the EXPLAIN tells me it won't use the index and performs a selection type: ALL / Using where; using filesort in the Resource table, witch is what I'm trying to avoid. If I try to FORCE INDEX(by_user, by_group), it is starts by merging both indexes type: index_merge / Using sort_union(by_user,by_group); Using where; Using filesort, which can be costly too.

Any better idea ? This request may be very frequently called…

+1  A: 

I think a second join will better utilize indexes here versus the subquery you suggested (just need to add a DISTINCT to avoid duplicates).

  SELECT SQL_NO_CACHE DISTINCT
         resource.*, user.nickname
    FROM resource
LEFT JOIN user ON user.id = resource.user_id
LEFT JOIN user_group ON user_group.user_id = user.id AND user_group.type IN (1, 2, 3)
    WHERE user_id = '000000-0000-0000-0000-000000000000'
       OR access_type = 3
      OR (access_type = 2 AND group_id IS NOT NULL)
ORDER BY created DESC;

Any time your primary WHERE clause is a sequence of OR's, the ideal EXPLAIN result will be some type of index_merge. However, you are correct that sort_union is the slowest of these. If you read up on index_merge, particularly the sub parts about each type of merge, it becomes clear that the OR pieces of your WHERE clause need to reference all parts of a key, otherwise you'll get a sort_union. So you should get a better EXPLAIN result if you had resource indexes:

Resource (10K-100K)
id
user_id
access_type (int)
group_id
created (int/timestamp)
…
indexes 
    just_user(user_id),
    by_user(user_id, created),
    just_access(access_type),
    just_access_group(access_type, group_id),
    by_group(access_type, group_id, created),
    public(access_type, created)

I recognize that this seems to contradict that you get left prefixes as indexes for free when you do multi-part keys but it seems the index_merge algorithm can't (yet?) utilize those partial prefix indexes.

Rob Van Dam
streetpc