views:

62

answers:

3

Is it possible to return querysets that return only one object per foreign key?

For instance, I want the to get the latest comments from django_comments, but I only want one comment (the latest comment) per object, i.e., only return the latest comment on an object and exclude all the past comments on that object. I guess this would be similar to a sql group_by on django_comments.content_type and django_comments.object_pk.

++ADDED INFO++

The end goal is to create a list of active comment "threads" displayed/ordered by which thread has the most recent comment, just like your standard discussion board whose topics are listed by recent activity.

I figure the best way to do this would be grabbing the latest comments, and then sorting or grouping them by content type and object_pk so that only one comment (the latest) is returned per related content object. I can then use that comment to get all the info I need, so the word thread is used loosely since I'm really just grabbing a comment and following it's pk's.

The MODEL is django_threadedcomments which extends django_comments with some added fields for trees, children, and parents.

VIEW:

...this returns all comments including all instances of parent

comments = ThreadedComment.objects.all().exclude(is_public='0').order_by("-submit_date")

...and this is ideal

comments = ThreadedComment.objects.all().exclude(is_public='0').order_by("submit_date").[plus sorting logic to exclude multiple instances of the same object_pk and content_type]

TEMPLATE:

{% for comment in comments %}

TITLE: {{comment.content_object.title}}

STARTED BY : {{comment.content_object.user}}

MOST RECENT REPLY : {{comment.user}} on {{comment.submit_date}}

{% endfor %}

Thanks again!

A: 

This is a fairly difficult thing to do in SQL at all; you probably won't be able to do it through the ORM.

You can't use GROUP BY for this. That's used for telling SQL how to group items for aggregation, which isn't what you're doing here. "SELECT x, y FROM table GROUP BY x" is illegal SQL, because the value of y is meaningless.

Let's look at this with a clear schema in mind:

CREATE TABLE objects ( id INTEGER PRIMARY KEY, name VARCHAR );
CREATE TABLE comments ( object_id INTEGER REFERENCES objects (id), text VARCHAR NOT NULL, date TIMESTAMP NOT NULL );

INSERT INTO objects (id, name) VALUES (1, 'object 1'), (2, 'object 2');
INSERT INTO comments (object_id, text, date) VALUES
   (1, 'object 1 comment 1', '2010-01-02'),
   (1, 'object 1 comment 2', '2010-01-05'),
   (2, 'object 2 comment 1', '2010-01-08'),
   (2, 'object 2 comment 2', '2010-01-09');

SELECT * FROM objects o JOIN comments c ON (o.id = c.object_id);

The most elegant way I've seen for doing this is Postgresql 8.4's windowing functions.

SELECT * FROM (
    SELECT
        o.*, c.*,
        rank() OVER (PARTITION BY object_id ORDER BY date DESC) AS r
    FROM objects o JOIN comments c ON (o.id = c.object_id)
) AS s
WHERE r = 1;

That'll select the first comment for each object by date, newest first. If you don't see what this is doing, execute the inner SELECT on its own and watch how it generates rank(), which makes it pretty straightforward.

I know other ways of doing this with Postgresql, but I don't know how to do this in other databases.

Trying to compute this dynamically is likely to give you serious headaches--and it takes more work to make these complex queries perform well, too. Chances are you're better off doing this the simple way: store a last_comment_id field for each object and update it when a comment is added or deleted, so you can just join and sort. You could probably use SQL triggers to handle this updating automatically.

Glenn Maynard
A: 

Consider storing the last post as a foreign key somewhere (e.g. in the parent object table). Each time a message is posted or deleted, update this key.

Yes, it's duplication, but worth considering. Having to run complex queries for each request (especially the index page) could take your application performance down. This is the pragmatic way to get the desired effect without losing performance.

vdboor
A: 

Thanks Glenn and vdboor. Agreed, the proposed idea creates way to much sql complexity and will seriously impact performance.

The last_comment_id suggestion is very good, but I believe that for my particular situation the best thing to do is create a separate "THREAD" model that stores the content_type and object_pk of the original object commented upon as well as the id and timestamp of the object's last comment, among a few other things. This will allow simple content object lookups and chronologically filtered querysets, and will make what's happening under the hood more closely mirror the front-end presentation, which is probably a good idea for posterity. :)

Cheers,

jnh

jnh