tags:

views:

238

answers:

3

How do I best gather messages that have not been read by a given user?

Existing tables

Message table
----------------------------------
id    title    body    sentAt

User table
----------------------------------
id    username

Read Messages table
----------------------------------
user_id    message_id

I'm thinking something like

select 
  m.id, m.title, m.sentAt, u.username
from 
  message m,
  [user] u
where 
  u.id = 1 and -- @userId parameter
  m.id not in 
    (select r.message_id from read_messages r where r.user_id = u.id)

Unfortunately for me I dont understand much looking at the execution plan. /Adam

+3  A: 

NOT IN is very expensive. Instead, you could do something like:

SELECT
  m.id, m.title, m.sentAt
FROM
  message m
  LEFT JOIN [Read Messages] rm
    ON m.message_id = rm.message_id AND rm.user_id = @userID
WHERE
  rm.user_id IS NULL

Provided you have the right indexes, this should be much faster.

You're fetching all messages and LEFT JOINing read messages for that user. Then, in the WHERE clause you're asking for that user_id to be NULL for that message, meaning the user has not read it yet.

Seb
+1. beat me to it!
Mitch Wheat
Is it that much a difference between this and my approach? I think NOT IN is much more readable than a JOIN but I know SQL isn't my strongest suit.
Adam Asham
Yes, there is. NOT IN will get a [possible huge] list of IDs from and then compare each message row with that (similar to WHERE id = X OR id = Y OR OR ...), which is too expensive. My alternative uses indexes so it should be way faster. Anyway, I believe Henrik Paul's solution is better than mine :)
Seb
+5  A: 

Suggesting an alternative approach:

I was faced with the exact same problem at work earlier. I wasted a good week in trying to figure the best way to do this. I ended up with creating a join table, as you have done, but the table contains only unread messages, instead of keeping track of read messages.

Because

  1. The status quo is "everyone has read all their messages".
  2. Getting the unread messages (or their count) should be as fast as possible.
  3. The status quo should be the least straining status on the system.

Now, if I would've kept track of all the messages everyone has read, the clutter in the database grows pretty rapidly (users*messages rows), easily leading to thousands of rows of 'dead weight' in even smaller applications. This problem is exaggerated if the lifetime of messages are indefinite - you could be keeping track of message statuses that are many years old.

If keeping track of the inverse, your "unread messages" table contains only a handful of rows, and they diminish for each message that a user reads. Also, getting the amount of unread messages is as simple as "SELECT COUNT(*) FROM unread WHERE user = foo".

But

As everything, this is a trade-off. While reading is pretty much as fast as computationally possible, writing is a chore. For each written message, you need to insert an entry to this join table. Additionally, if multiple people can read the same message, you need to insert one row for each recipient. If the recipients are implicit (e.g. only a user group's name is given, or even with the criteria such as "anyone who has access to this thing"), creating new messages becomes even more complicated.

But I feel this is a fair compromise.

YMMV, HTH.

Henrik Paul
Excellent presentation - good job.
Jonathan Leffler
I've been reading your post a few times and the more I think about it... it is a better fit for my scenario. Thanks for a comprehensive post!
Adam Asham
+1  A: 

HP's alternative suggestion is most likely suiting you well. However, if it is not, Id suggest using NOT EXISTS rather than LEFT JOIN if it's possible in your environment.

At least on MS SQL it would give a slightly cheaper query plan, since it would not need the last filtering (user_id IS NULL)

SELECT  ...
FROM    message m
WHERE   NOT EXISTS (
    SELECT 1
    FROM read_messages rm
    WHERE rm.usr_id = ...
    AND rm.msg_id = m.msg_id
)
Brimstedt