views:

58

answers:

1

What's the standard relational database idiom for setting permissions for items?

Answers should be general; however, they should be able to be applied to example below. Anything flies: adding columns, adding another table—whatever as long as it works well.

Application / Example

Assume the Twitter database is extremely simple: we have one User table, which contains a login and user id; we have a Tweet table, which contains a tweet id, tweet text, and creator id; and we have a Follower table, which contains the id of the person being followed and the follower.

Now, assume Twitter wants to enable advanced privacy settings (viewing permissions), so that users can pick exactly which followers can view tweets. The settings can be:

  • Everyone on Twitter
  • Only current followers (which would of course have to be approved by the user, this doesn't really matter though) EDIT: Current as in, I get a new follower, he sees it; I remove a follower, he stops seeing it.
  • Specific followers (e.g., user id 5, 10, 234, and 1)
  • Only the owner

Under these circumstances, what's the best way to represent viewing permissions? The priorities, in order, are speed of lookup (you want to be able to figure out what tweets to display to a user quickly), speed of creation (you don't want to take forever to post a tweet), and efficient use of space (every time I post a tweet to everyone on my followers' list, I shouldn't have to add a row for each and every follower I have to some table.)

+1  A: 

Looks like a typical many-to-many relationship -- I don't see any restrictions on what you desire that would allow space savings wrt the typical relational DB idiom for those, i.e., a table with two columns (both foreign keys, one into users and one into tweets)... since the current followers can and do change all the time, posting a tweet to all the followers that are current at the instant of posting (I assume that's what you mean?) does mean adding that many (extremely short) rows to that relationship table (the alternative of keeping a timestamped history of follower sets so you can reconstruct who was a follower at any given tweet-posting time appears definitely worse in time and not substantially better in space).

If, on the other hand, you want to check followers at the time of viewing (rather than at the time of posting), then you could make a special userid artificially meaning "all followers of the current user" (just like you'll have one meaning "all users on Twitter"); the needed SQL to make the lookup fast, in that case, looks hairy but feasible (a UNION or OR with "all tweets for which I'm a follower of the author and the tweet is readable by [the artificial userid representing] all followers"). I'm not getting deep into that maze of SQL until and unless you confirm that it is this peculiar meaning that you have in mind (rather than the simple one which seems more natural to me but doesn't allow any space savings on the relationship table for the action of "post tweet to all followers").

Edit: the OP has clarified they mean the approach I mention in the second paragraph.

Then, assume userid is the primary key of the Users table, the Tweets table has a primary key tweetid and a foreign key author for the userid of each tweet's author, the Followers table is a typical many-to-many relationship table with the two columns (both foreign keys into Users) follower and followee, and the Canread table a not-so-typical many-to-many relationship table, still with two column -- foreign key into Users is column reader, foreign key into Tweets is column tweet (phew;-). Two special users @everybody and @allfollowers are defined with the above meanings (so that posting to everybody, all followers, or "just myself", all add only one row to Canread -- only selective posting to a specific list of N people adds N rows).

So the SQL for the set of tweet IDs a user @me can read is, I think, something like:

SELECT Tweets.tweetid 
  FROM Tweets
  JOIN Canread ON(Tweets.tweetid=Canread.tweet)
 WHERE Canread.reader IN (@me, @everybody)

UNION

SELECT Tweets.tweetid 
  FROM Tweets
  JOIN Canread ON(Tweets.tweetid=Canread.tweet)
  JOIN Followers ON(Tweets.author=Followers.followee)
 WHERE Canread.reader=@allfollowers
   AND Followers.follower=@me
Alex Martelli
Sorry, my bad for ambiguity—I mean current as in changing. See updated question.
aharon
I just think adding so many rows per post can't be the optimal way of doing things; Facebook has a similar (real) ability, I doubt that they do it that way—if only that it seems a lot of work.
aharon
@aharon, does Facebook use a relational DB? I thought, like all very-high-volume sites (inc. Twitter), they had jumped on the NoSQL bandwagon (exactly for reasons of scalability).
Alex Martelli
Facebook supposedly uses a huge MySQL cluster—as of 2008 at least. Are there any non-sql databases that can deal with this kind of query/relation better that you know of?
aharon
@aharon, yes there are (where "better" means "far more scalable") -- of course NoSQL DBs always have to be carefully de-normalized for maximum performance and scalability, in ways that vary entirely among them, but Google's Bigtable (not the much simpler facade exposed in App Engine) would make mincemeat of this, and I'd be astonished if, say, Cassandra couldn't (but as a Googler my work's on bigtable, so that's the one I'm really familiar with). Anyway, see my SQL attempt above, doesn't look too bad to me (with all the needed indices of course).
Alex Martelli
Thanks so much Alex; the MySQL seems to work, I sort of understand how it works :) and I'll have a look at Bigtable and Cassandra.
aharon
@aharon, about Bigtable you can read whitepapers such as http://labs.google.com/papers/bigtable.html , but to _use_ it you should apply to work at Google (hint: we're hiring, email me if interested;-), as I don't think we make it available outside (except through the nice simple facade in App Engine). Cassandra, MongoDB, Hypertable, CouchDB (etc), open source, or Amazon's Dynamo (commercial closed source), may be more useful to you otherwise, as you get to _use_ them, not just _study_ them;-). Glad to hear the SQL I proposed works for you, anyway!
Alex Martelli