views:

63

answers:

3

In my application I have a table of application events that are used to generate a user-specific feed of application events. Because it is generated using an OR query, I'm concerned about performance of this heavily used query and am wondering if I'm approaching this wrong.

In the application, users can follow both other users and groups. When an action is performed (eg, a new post is created), a feed_item record is created with the actor_id set to the user's id and the subject_id set to the group id in which the action was performed, and actor_type and subject_type are set to the class names of the models. Since users can follow both groups and users, I need to generate a query that checks both the actor_id and subject_id, and it needs to select distinct records to avoid duplicates. Since it's an OR query, I can't use an normal index. And since a record is created every time an action is performed, I expect this table to have a lot of records rather quickly.

Here's the current query (the following table joins users to feeders, aka, users and groups)

SELECT DISTINCT feed_items.* FROM "feed_items" 
 INNER JOIN "followings" 
 ON (
 (followings.feeder_id = feed_items.subject_id 
 AND followings.feeder_type = feed_items.subject_type)
 OR
 (followings.feeder_id = feed_items.actor_id 
 AND followings.feeder_type = feed_items.actor_type)
 )
 WHERE (followings.follower_id = 42) ORDER BY feed_items.created_at DESC LIMIT 30 OFFSET 0

So my questions:

  • Since this is a heavily used query, is there a performance problem here?

  • Is there any obvious way to simplify or optimize this that I'm missing?

A: 

To find out if there is a performance problem measure it. PostgreSQL can explain it for you.

I don't think that the query needs simplifying, if you identify a performance problem then you may need to revise your indexes.

Richard Harrison
+1  A: 

What you have is called an exclusive arc and you're seeing exactly why it's a bad idea. The best approach for this kind of problem is to make the feed item type dynamic:

  • Feed Items: id, type (A or S for Actor or Subject), subtype (replaces actor_type and subject_type)

and then your query becomes

SELECT DISTINCT fi.*
FROM feed_items fi
JOIN followings f ON f.feeder_id = fi.id AND f.feeder_type = fi.type AND f.feeder_subtype = fi.subtype

or similar.

This may not completely or exactly represent what you need to do but the principle is sound: you need to eliminate the reason for the OR condition by changing your data model in such a way to lend itself to having performant queries being written against it.

cletus
It seems like using this method I'd need to create two feed items for each action, one each for actor and subject, and then filter out duplicates, since that would return duplicates when the actor and subject are both being followed. Unless I'm totally off base, I would presumably need to also have an `action_id` and do something like `SELECT feed_items.* FROM feed_items WHERE feed_items.action_id IN (SELECT DISTINCT fi.action_id FROM feed_items fi JOIN followings f ON f.feeder_id = fi.id AND f.feeder_type = fi.type AND f.feeder_subtype = fi.subtype)` Is that correct? And is it better than OR?
A: 

Explain analyze and time query to see if there is a problem.

Aso you could try expressing the query as a union

SELECT x.* FROM
(
SELECT feed_items.* FROM feed_items
INNER JOIN followings 
ON  followings.feeder_id = feed_items.subject_id 
    AND followings.feeder_type = feed_items.subject_type
WHERE (followings.follower_id = 42)
UNION
SELECT feed_items.* FROM feed_items
INNER JOIN followings
 followings.feeder_id = feed_items.actor_id 
 AND followings.feeder_type = feed_items.actor_type)
WHERE (followings.follower_id = 42)
) AS x
ORDER BY x.created_at DESC 
LIMIT 30

But again explain analyze and benchmark.

Gavin