views:

84

answers:

4

Hi, I have got 3 tables with those columns below:

Topics:
[TopicID] [TopicName]
Messages:
[MessageID] [MessageText]
MessageTopicRelations
[EntryID] [MessageID] [TopicID]

messages can be about more than one topic. question is: given couple of topics, I need to get messages which are about ALL these topics and not the less, but they can be about some other topic too. a message which is about SOME of these given topics won't be included. I hope I explained my request well. otherwise I can provide sample data. thanks

+1  A: 

Here's a profoundly inelegant solution

SELECT
     m.MessageID
    ,m.MessageText
FROM
    Messages m
WHERE
    m.MessageID IN (
    SELECT
        mt.MessageID
    FROM
        MessageTopicRelations mt
    WHERE
        TopicID IN (1,4,5)// List of topic IDS
    GROUP BY
        mt.MessageID
    HAVING
        count(*) = 3 //Number of topics
    )
Paul Creasey
'Profoundly inelegant', funny :-) I don't believe this will work, however, as the `count(*)` could equal three in many different ways aside from the one way that the OP requires.
Adam Bernier
@Adam: Your concern is valid - the solution is dependent on the data model. See my answer for details, but +1 from me.
OMG Ponies
I make the assumption that the many to many table uses a compound key, this is what i would consider typical and best practice. If that is not the case then the query is invalid.
Paul Creasey
@Paul: point well taken. I guess that makes you an optimist, and me a pessimist ;-)
Adam Bernier
+1  A: 

Edit: thanks to @Paul Creasey and @OMG Ponies for finding the flaws in my approach.
The correct way to do this is with a self-join for each topic; as shown in the leading answer.


Another profoundly inelegant entry:

select m.MessageText
       , t.TopicName
  from Messages m
       inner join MessageTopicRelations mtr
       on mtr.MessageID = m.MessageID
       inner join Topics t
       on t.TopicID = mtr.TopicID
   and
       t.TopicName = 'topic1'

UNION 

select m.MessageText
       , t.TopicName
  from Messages m
       inner join MessageTopicRelations mtr
       on mtr.MessageID = m.MessageID
       inner join Topics t
       on t.TopicID = mtr.TopicID
   and
       t.TopicName = 'topic2'
...
Adam Bernier
Tangentially: this kind of query is profoundly and elegantly simple when querying on a `ListProperty` in Google App Engine's datastore.
Adam Bernier
+1: This would work too, though requires more typing than the JOINs or GROUP BY/HAVING COUNT methods.
OMG Ponies
I don't get this, am i missing something? This doesn't solve the problem, or make sense to me. It would surely return any message with any topic and why union when you could use OR or IN
Paul Creasey
@Paul: Now that I re-read it, yeah - it would return results equivalent to using `IN (1, 2, 3)` because the UNION doesn't ensure that the message returned belongs to both topics.
OMG Ponies
+5  A: 

The following use x, y, and z to stand in for topic ids, being that none were provided for examples.

Using JOINs:

SELECT m.*
  FROM MESSAGES m
  JOIN MESSAGETOPICRELATIONS mtr ON mtr.messageid = m.messageid
  JOIN TOPICS tx ON tx.topicid = mtr.topicid
                AND tx.topicid = x
  JOIN TOPICS ty ON ty.topicid = mtr.topicid
                AND ty.topicid = y
  JOIN TOPICS tz ON tz.topicid = mtr.topicid
                AND tz.topicid = z

Using GROUP BY/HAVING COUNT(*):

  SELECT m.*
    FROM MESSAGES m
    JOIN MESSAGETOPICRELATIONS mtr ON mtr.messageid = m.messageid
    JOIN TOPICS t ON t.topicid = mtr.topicid
   WHERE t.topicid IN (x, y, z)
GROUP BY m.messageid, m.messagetext
  HAVING COUNT(*) = 3

Of the two, the JOIN approach is safer.

The GROUP BY/HAVING relies on the MESSAGETOPICRELATIONS.TOPICID being either part of the primary key, or having a unique key constraint to ensure there aren't duplicates. Otherwise, you could have 2+ instances of the same topic associated to a message - which would be a false positive. Using HAVING COUNT(DISTINCT ... would clear up any false positives, but support depends on the database - MySQL supports it at 5.1+, but not on 4.1. Oracle might, have to wait till Monday to test on SQL Server...

I looked into Bill's comment about not needing the join to the TOPICS table:

SELECT m.*
  FROM MESSAGES m
  JOIN MESSAGETOPICRELATIONS mtr ON mtr.messageid = m.messageid
                                AND mtr.topicid IN (x, y, z)

...will return false positives - rows that match at least one of the values defined in the IN clause. And:

SELECT m.*
  FROM MESSAGES m
  JOIN MESSAGETOPICRELATIONS mtr ON mtr.messageid = m.messageid
                                AND mtr.topicid = x
                                AND mtr.topicid = y
                                AND mtr.topicid = z

...won't return anything at all, because the topicid can never be all of the values at once.

OMG Ponies
+1 I am enlightened again by the master.
Adam Bernier
You don't need a join to the topics table, only to the MessageTopicRelations table. And I'd suggest `COUNT(DISTINCT topicid) = 3` unless you can rely on a unique constraint in the MessageTopicRelations table over (messageid, topicid).
Bill Karwin
@Bill: Last time I checked, `HAVING COUNT(DISTINCT ...` didn't work on MySQL. Dunno what db this is for on that note
OMG Ponies
Every many to many table i've ever implemented or encountered has had a compound primary key, so i think the warning is moot for most schemas
Paul Creasey
@Paul: ORMs don't like compound keys IIRC, and there are enough questions on SO to make me *never* assume a unique key is in place, much less a primary key :/
OMG Ponies
Hmmm fair enough, i suppose that to assume makes an ass out of u and me, especially on SO. AFAIK most ORM's handle compounds keys well these days, at least the ones i've used have! :)
Paul Creasey
@OMG Ponies: See my answer.
Bill Karwin
+1  A: 

Re: the answer by OMG Ponies, you don't need to join to the TOPICS table. And the HAVING COUNT(DISTINCT) clause works fine in MySQL 5.1. I just tested it.

This is what I mean:

Using GROUP BY/HAVING COUNT(*):

  SELECT m.*
    FROM MESSAGES m
    JOIN MESSAGETOPICRELATIONS mtr ON mtr.messageid = m.messageid
   WHERE mtr.topicid IN (x, y, z)
GROUP BY m.messageid
  HAVING COUNT(DISTINCT mtr.topicid) = 3

The reason that I suggest COUNT(DISTINCT) is that if the columns (messageid,topicid) don't have a unique constraint, you could get duplicates, which would result in a count of 3 in the group, even with fewer than three distinct values.

Bill Karwin
@Bill: That's the only way I could figure you were talking about - it had to be with `HAVING COUNT(DISTINCT...` (which didn't work on 4.1 for me). Thanks - appreciate the info!
OMG Ponies
MySQL 4.1 is more than five years old and it has officially reached the end of extended support. I think it's reasonable to answer StackOverflow questions based on the actively supported version 5.1.
Bill Karwin
I support an app using 4.1 - I'm all about the old school =)
OMG Ponies