views:

76

answers:

2

This post is taking a substantial amount of time to type because I'm trying to be as clear as possible, so please bear with me if it is still unclear.

Basically, what I have are a table of posts in the database which users can add privacy settings to.

ID | owner_id | post | other_info | privacy_level (int value)

From there, users can add their privacy details, allowing it to be viewable by all [privacy_level = 0), friends (privacy_level = 1), no one (privacy_level = 3), or specific people or filters (privacy_level = 4). For privacy levels specifying specific people (4), the query will reference the table "post_privacy_includes_for" in a subquery to see if the user (or a filter the user belongs to) exists in a row in the table.

ID | post_id | user_id | list_id

Also, the user has the ability to prevent some people from viewing their post in within a larger group by excluding them (e.g., Having it set for everyone to view but hiding it from a stalker user). For this, another reference table is added, "post_privacy_exclude_from" - it looks identical to the setup as "post_privacy_includes_for".

My problem is that this does not scale. At all. At the moment, there are about 1-2 million posts, the majority of them set to be viewable by everyone. For each post on the page it must check to see if there is a row that is excluding the post from being shown to the user - this moves really slow on a page that can be filled with 100-200 posts. It can take up to 2-4 seconds, especially when additional constraints are added to the query.

This also creates extremely large and complex queries that are just... awkward.

SELECT t.*
FROM posts t
WHERE ( (t.privacy_level = 3
         AND t.owner_id = ?)
       OR (t.privacy_level = 4
           AND EXISTS
             ( SELECT i.id
              FROM PostPrivacyIncludeFor i
              WHERE i.user_id = ?
                AND i.thought_id = t.id)
           OR t.privacy_level = 4
           AND t.owner_id = ?)
       OR (t.privacy_level = 4
           AND EXISTS
             (SELECT i2.id
              FROM PostPrivacyIncludeFor i2
              WHERE i2.thought_id = t.id
                AND EXISTS
                  (SELECT r.id
                   FROM FriendFilterIds r
                   WHERE r.list_id = i2.list_id
                     AND r.friend_id = ?))
           OR t.privacy_level = 4
           AND t.owner_id = ?)
       OR (t.privacy_level = 1
           AND EXISTS
             (SELECT G.id
              FROM Following G
              WHERE follower_id = t.owner_id
                AND following_id = ?
                AND friend = 1)
           OR t.privacy_level = 1
           AND t.owner_id = ?)
       OR (NOT EXISTS
             (SELECT e.id
              FROM PostPrivacyExcludeFrom e
              WHERE e.thought_id = t.id
                AND e.user_id = ?
                AND NOT EXISTS
                  (SELECT e2.id
                   FROM PostPrivacyExcludeFrom e2
                   WHERE e2.thought_id = t.id
                     AND EXISTS
                       (SELECT l.id
                        FROM FriendFilterIds l
                        WHERE l.list_id = e2.list_id
                          AND l.friend_id = ?)))
           AND t.privacy_level IN (0, 1, 4))
  AND t.owner_id = ?
ORDER BY t.created_at LIMIT 100

(mock up query, similar to the query I use now in Doctrine ORM. It's a mess, but you get what I am saying.)

I guess my question is, how would you approach this situation to optimize it? Is there a better way to set up my database? I'm willing to completely scrap the method I have currently built up, but I wouldn't know what to move onto.

Thanks guys.

Updated: Fix the query to reflect the values I defined for privacy level above (I forgot to update it because I simplified the values)

+1  A: 

Your query is too long to give a definitive solution for, but the approach I would follow is to simply the data lookups by converting the sub-queries into joins, and then build the logic into the where clause and column list of the select statement:

select t.*, i.*, r.*, G.*, e.* from posts t
left join PostPrivacyIncludeFor i on i.user_id = ? and i.thought_id = t.id
left join FriendFilterIds r on r.list_id = i.list_id and r.friend_id = ?
left join Following G on follower_id = t.owner_id and G.following_id = ? and G.friend=1
left join PostPrivacyExcludeFrom e on e.thought_id = t.id and e.user_id = ? 

(This might need expanding: I couldn't follow the logic of the final clause.)

If you can get the simple select working fast AND including all the information needed, then all you need to do is build up the logic in the select list and where clause.

Martin
A: 

Had a quick stab at simplifying this without re-working your original design too much.

Using this solution your web page can now simply call the following stored procedure to get a list of filtered posts for a given user within a specified period.

call list_user_filtered_posts( <user_id>, <day_interval> );

The whole script can be found here : http://pastie.org/1212812

I haven't fully tested all of this and you may find this solution isn't performant enough for your needs but it may help you in fine tuning/modifying your existing design.

Tables

Dropped your post_privacy_exclude_from table and added a user_stalkers table which works pretty much like the inverse of user_friends. Kept the original post_privacy_includes_for table as per your design as this allows a user restrict a specific post to a subset of people.

drop table if exists users;
create table users
(
user_id int unsigned not null auto_increment primary key,
username varbinary(32) unique not null
)
engine=innodb;


drop table if exists user_friends;
create table user_friends
(
user_id int unsigned not null,
friend_user_id int unsigned not null,
primary key (user_id, friend_user_id)
)
engine=innodb;


drop table if exists user_stalkers;
create table user_stalkers
(
user_id int unsigned not null,
stalker_user_id int unsigned not null,
primary key (user_id, stalker_user_id)
)
engine=innodb;


drop table if exists posts;
create table posts
(
post_id int unsigned not null auto_increment primary key,
user_id int unsigned not null,
privacy_level tinyint unsigned not null default 0,
post_date datetime not null,
key user_idx(user_id),
key post_date_user_idx(post_date, user_id)
)
engine=innodb;


drop table if exists post_privacy_includes_for;
create table post_privacy_includes_for
(
post_id int unsigned not null,
user_id int unsigned not null,
primary key (post_id, user_id)
)
engine=innodb;

Stored Procedures

The stored procedure is relatively simple - it initially selects ALL posts within the specified period and then filters out posts as per your original requirements. I have not performance tested this sproc with large volumes but as the initial selection is relatively small it should be performant enough as well as simplifying your application/middle tier code.

drop procedure if exists list_user_filtered_posts;

delimiter #

create procedure list_user_filtered_posts
(
in p_user_id int unsigned,
in p_day_interval tinyint unsigned
)
proc_main:begin

 drop temporary table if exists tmp_posts;
 drop temporary table if exists tmp_priv_posts;

 -- select ALL posts in the required date range (or whatever selection criteria you require)

 create temporary table tmp_posts engine=memory 
 select 
  p.post_id, p.user_id, p.privacy_level, 0 as deleted 
 from 
  posts p
 where
  p.post_date between now() - interval p_day_interval day and now()  
 order by 
  p.user_id;

 -- purge stalker posts (0,1,3,4)

 update tmp_posts 
 inner join user_stalkers us on us.user_id = tmp_posts.user_id and us.stalker_user_id = p_user_id
 set
  tmp_posts.deleted = 1
 where
  tmp_posts.user_id != p_user_id;

 -- purge other users private posts (3)

 update tmp_posts set deleted = 1 where user_id != p_user_id and privacy_level = 3;

 -- purge friend only posts (1) i.e where p_user_id is not a friend of the poster

 /*
  requires another temp table due to mysql temp table problem/bug
  http://dev.mysql.com/doc/refman/5.0/en/temporary-table-problems.html
 */

 -- the private posts (1) this user can see

 create temporary table tmp_priv_posts engine=memory 
 select
  tp.post_id
 from
  tmp_posts tp
 inner join user_friends uf on uf.user_id = tp.user_id and uf.friend_user_id = p_user_id
 where
  tp.user_id != p_user_id and tp.privacy_level = 1;

 -- remove private posts this user cant see

 update tmp_posts 
 left outer join tmp_priv_posts tpp on tmp_posts.post_id = tpp.post_id 
 set 
  tmp_posts.deleted = 1
 where 
  tpp.post_id is null and tmp_posts.privacy_level = 1;

 -- purge filtered (4)

 truncate table tmp_priv_posts; -- reuse tmp table

 insert into tmp_priv_posts
 select
  tp.post_id
 from
  tmp_posts tp
 inner join post_privacy_includes_for ppif on tp.post_id = ppif.post_id and ppif.user_id = p_user_id
 where
  tp.user_id != p_user_id and tp.privacy_level = 4;

 -- remove private posts this user cant see

 update tmp_posts 
 left outer join tmp_priv_posts tpp on tmp_posts.post_id = tpp.post_id 
 set 
  tmp_posts.deleted = 1
 where 
  tpp.post_id is null and tmp_posts.privacy_level = 4;

 drop temporary table if exists tmp_priv_posts;

 -- output filtered posts (display ALL of these on web page)

 select 
  p.* 
 from 
  posts p
 inner join tmp_posts tp on p.post_id = tp.post_id
 where
  tp.deleted = 0
 order by
  p.post_id desc;

 -- clean up

 drop temporary table if exists tmp_posts;

end proc_main #

delimiter ;

Test Data

Some basic test data.

insert into users (username) values ('f00'),('bar'),('alpha'),('beta'),('gamma'),('omega');

insert into user_friends values 
(1,2),(1,3),(1,5),
(2,1),(2,3),(2,4),
(3,1),(3,2),
(4,5),
(5,1),(5,4);

insert into user_stalkers values (4,1);

insert into posts (user_id, privacy_level, post_date) values

-- public (0)

(1,0,now() - interval 8 day),
(1,0,now() - interval 8 day),
(2,0,now() - interval 7 day),
(2,0,now() - interval 7 day),
(3,0,now() - interval 6 day),
(4,0,now() - interval 6 day),
(5,0,now() - interval 5 day),

-- friends only (1)

(1,1,now() - interval 5 day),
(2,1,now() - interval 4 day),
(4,1,now() - interval 4 day),
(5,1,now() - interval 3 day),

-- private (3)

(1,3,now() - interval 3 day),
(2,3,now() - interval 2 day),
(4,3,now() - interval 2 day),

-- filtered (4)

(1,4,now() - interval 1 day),
(4,4,now() - interval 1 day),
(5,4,now());

insert into post_privacy_includes_for values (15,4), (16,1), (17,6);

Testing

As I mentioned before I've not fully tested this but on the surface it seems to be working.

select * from posts;

call list_user_filtered_posts(1,14);
call list_user_filtered_posts(6,14);

call list_user_filtered_posts(1,7);
call list_user_filtered_posts(6,7);

Hope you find some of this of use.

f00