ansaurus

Question

Answer 1

+7 A:

Actually you're incorrect: with ANSI SQL this isn't possible. Certain databases with vendor extensions (eg Oracle's CONNECT BY) may be able to do what you want but not plain old SQL.

My advice? Change your data so enable an easier solution.

In this case, give each message a conversation_id. If the user posts a new message, give that a new (currently unused) value. If they reply, keep the conversation_id of the message being replied to.

Then querying the data becomes trivial.

cletus 2009-12-22 15:36:09

+1 To write one select to do this, you would have to define a max depth, and manually write out that many joins. But if the conversation goes one more message beyond how many joins you wrote, your query will fail to return all of them.

David Oneill 2009-12-22 15:42:24

I don't think that's right David; can you explain further? Cletus's design makes sense to me.

Craig Walker 2009-12-22 15:50:39

Note tha this is sort-of breaking normalization (though for a worthy cause), as the conversation_id could potentially get out of sync with the reply_to hierarchy. It's still a great idea though since it solves the problem without too much hassle. Just make sure that your application code reliably sets the conversation_id. It's also a good idea to have a checking/cleanup process that periodically checks all reply hierarchies for the correct conversation_id.

Craig Walker 2009-12-22 15:53:19

`@Craig Walker`: this design allows selecting only the whole conversations (you cannot easily filter all replies to something that is not the message that started the conversation). However, the original question didn't ask for that, so this design does answer the question.

Quassnoi 2009-12-22 15:53:21

This IS completely possible using a few nested select statements, albeit not a good idea. The `conversation_id` would CERTAINLY simplify it.

md5sum 2009-12-22 15:54:52

@Quassnoi: That's true, though once you have the whole conversation you can do a relatively simple second-pass on the hierarchy in the application code (which probably supports recursion better). You'll want to do that regardless just to rebuild the hierarchy structure.

Craig Walker 2009-12-22 16:10:00

@md5sum: the problem with nested selects is that you have to know how many selects you need ahead of time (ie: before you build your query). You can't do an arbitrary-depth recursion in plain SQL. For "reply-to" structures like this one that's almost certainly what you need.

Craig Walker 2009-12-22 16:11:21

`@Craig Walker`: sure you can. However, questions tagged `SQL`, in my opinion, ask for the database-side answers, and most systems (including `MySQL` with the hack mentioned in my answer) support filtering on the parent message and building the hierarchical structures (preserving the order) on the database side.

Quassnoi 2009-12-22 16:13:53

@Quassnoi: good point. As Cletus originally said: if you want straight SQL you're out of luck. If you want recursiveness, you'll have to go somewhere else, be it DB extensions and/or application code.

Craig Walker 2009-12-22 18:16:06

Answer 2

+2 A:

This is the adjacency list model.

MySQL has no native way to query it, but you can use a certain hack: create a function like this:

CREATE FUNCTION hierarchy_connect_by_parent_eq_prior_id(value INT) RETURNS INT
NOT DETERMINISTIC
READS SQL DATA
BEGIN
        DECLARE _id INT;
        DECLARE _parent INT;
        DECLARE _next INT;
        DECLARE CONTINUE HANDLER FOR NOT FOUND SET @id = NULL;

        SET _parent = @id;
        SET _id = -1;

        IF @id IS NULL THEN
                RETURN NULL;
        END IF;

        LOOP
                SELECT  MIN(id)
                INTO    @id
                FROM    messages
                WHERE   reply_to = _parent
                        AND id > _id;
                IF @id IS NOT NULL OR _parent = @start_with THEN
                        SET @level = @level + 1;
                        RETURN @id;
                END IF;
                SET @level := @level - 1;
                SELECT  id, reply_to
                INTO    _id, _parent
                FROM    messages
                WHERE   id = _parent;
        END LOOP;
END

and use it in a query:

SELECT  CONCAT(REPEAT('    ', level - 1), CAST(hi.id AS CHAR)) AS treeitem, parent, level
FROM    (
        SELECT  hierarchy_connect_by_parent_eq_prior_id(id) AS id, @level AS level
        FROM    (
                SELECT  @start_with := 0,
                        @id := @start_with,
                        @level := 0
                ) vars, messages
        WHERE   @id IS NOT NULL
        ) ho
JOIN    messages hi
ON      hi.id = ho.id

See this article in my blog for more detailed explanations on how it works:

Hierarchical queries in MySQL

Only the children of the original message (whose id should be used to initialize @start_with) will be selected.

This query additionally can be filtered for the values of sender_id and receiver_id to make sure only the messages between the users are selected.

Quassnoi 2009-12-22 15:38:15

I've done similar things in SQL Server 2005+ by using CLR (ie: C#) functions.

Craig Walker 2009-12-22 15:55:45

`@Craig Walker`: `SQL Server 2005` supports recursive `CTE`'s, there is no need to use `CLR` for that.

Quassnoi 2009-12-22 15:59:05

@Quassnoi: Yes, recursive CTEs would be an even better (in fact ideal) solution to this problem. The problem I solved with CLR functions had some additional application-level complexity that warranted the jump.

Craig Walker 2009-12-22 16:13:44

This method also assumes that during the course of a conversation with multiple users, no participants are also carrying on their own conversations externally.

md5sum 2009-12-22 16:24:37

`md5sum`: this method assumes nothing like that. It builds the complete hierarchy starting from the message with the given `id` which later can be filtered on the values of `sender_id` and `receiver_id`. The side conversations are not a part of hierarchy and won't ever be returned by this query.

Quassnoi 2009-12-22 16:29:35

@Quassnoi - You're right, missed the purpose of reply_to.

md5sum 2009-12-22 16:34:30

Answer 3

+6 A:

I would suggest adding a conversation_id field to the messages table. Each new non-reply message would have a generated conversation_id, and then each reply based on that message would use the same id. Then your query would be simple:

select * from messages where conversation_id = ? order by id asc

Kaleb Brasee 2009-12-22 15:39:05

Answer 4

+2 A:

I use a technique I found in Joe Celko's SQL for Smarties book - chapter 29 on the nested set model for trees. Maintaining the data is a little ugly (insert, update, delete), but selects are very fast. The code in the book is very thorough and well explained. There is also some info on how to convert the data you have into this new model.

Ray 2009-12-22 15:44:05

Answer 5

+1 A:

Using something like:

SELECT *
FROM [Messages] 
WHERE 
    (
        [Sender] = @Sender 
            AND [Reciever] = @Reciever
    ) OR (
        [Sender] = @Reciever 
            AND [Reciever] = @Sender)

Will get you an entire conversation history. As far as the reply_to field, I wouldn't use that because:

A) It will make it very complex to retrieve the first message of a conversation.

B) You can use other filters such as date or limit the history length to prevent a full output of every conversation the users have had.

Instead, I would add something along the lines of a ConversationId instead that would increment if there was no messages sent between users for a pre-specified length of time.

In specific answer to your question though, the following query will work, provided you can allow selection of the first message in a conversation:

SELECT * 
FROM [Messages] 
WHERE 
    (
        (
            [Sender] = @Sender 
                AND [Reciever] = @Reciever
        ) OR (
            [Sender] = @Reciever 
                AND [Reciever] = @Sender)
    )
    AND id >= @FirstMessageId
    AND id < 
        (
            SELECT TOP 1 [id] 
            FROM [Messages] 
            WHERE [id] > @FirstMessageId
                AND [reply_to] IS NULL
                AND  
                    (       
                        (
                            [Sender] = @Sender 
                                AND [Reciever] = @Reciever
                        ) OR (
                            [Sender] = @Reciever 
                                AND [Reciever] = @Sender
                        )
                    )
    )

md5sum 2009-12-22 15:47:37

This only works if the conversation has exactly two conversants AND they're both known ahead of time. That happens to match the poster's example but probably wouldn't work in the real application.

Craig Walker 2009-12-22 15:58:02

The poster's example data schema only allows for conversations between two people, and as I stated, it's better to use a `ConversationId`.

md5sum 2009-12-22 16:04:58

`@md5sum`: no, it does not. You can put arbitrary values into `sender_id` and `receiver_id`. In fact, this schema is denormalized, so that the `receiver_id` does not necessarily match the `sender_id` in the message referenced by `reply_to`. You can put `4` different people in two messages.

Quassnoi 2009-12-22 16:06:58

@Quassnoi - You're assuming that the OP is allowing multiple ids to be placed in the recipient field, which, in accordance to his question, I would assume he isn't. If he IS however, this query could be modified using an `IN` instead, which would further slow down execution. Once again, `ConversationId` will fix this.

md5sum 2009-12-22 16:12:47

`@md5sum`: I assume nothing like that. There can be two replies to one message (in two separate records) and these replies can be posted by one or multiple persons.

Quassnoi 2009-12-22 16:17:40

@Quassnoi - At that point he's just screwed without `ConversationId` LoL... :D

md5sum 2009-12-22 16:19:39

`@md5sum`: btw, downvoting without explaining the reason is considered a bad tone.

Quassnoi 2009-12-22 16:27:02

@Quassnoi - I haven't downvoted anything where it either wasn't already explained, or without explaining. In your case, if you'll edit your answer, the downvote will be removed, that was my honest mistake.

md5sum 2009-12-22 16:38:25

`md5sum`: updated the answer.

Quassnoi 2009-12-22 16:41:21

Done, I'd give you a + too, but it won't let me :( may try editing again, but I dunno...

md5sum 2009-12-22 16:43:52

ansaurus

tags:

views:

answers:

Help with recursive SELECT

related questions