the real question is not the query, but the schema, specially the clustered indexes. The comment ordering requirements are ambuigous as you defined them (is it only 5 per answer or not?). I interpreted the requirements as 'pull 5 comments per post (answer or question) and give preference to upvoted ones, then to newer ones. I know this is not how SO comments are showen, but you gotta define your requirements more precisesly.
Here is my query:
declare @postId int;
set @postId = ?;
with cteQuestionAndReponses as (
select post_id
from Posts
where post_id = @postId
union all
select post_id
from Posts
where parent_id = @postId)
select * from
cteQuestionAndReponses p
outer apply (
select count(*) as CommentsCount
from Comments c
where is_deleted = 0
and c.post_id = p.post_id) as cc
outer apply (
select top(5) *
from Comments c
where is_deleted = 0
and p.post_id = c.post_id
order by upvotes desc, date desc
) as c
I have some 14k posts and 67k comments in my test tables, the query gets the posts in 7ms:
Table 'Comments'. Scan count 12, logical reads 50, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Posts'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 7 ms.
Here is the schema I tested with:
create table Posts (
post_id int identity (1,1) not null
, content varchar(max) not null
, parent_id int null -- (null for questions, question_id for answer)
, constraint fkPostsParent_id
foreign key (parent_id)
references Posts(post_id)
, constraint pkPostsId primary key nonclustered (post_id)
);
create clustered index cdxPosts on
Posts(parent_id, post_id);
go
create table Comments (
comment_id int identity(1,1) not null
, body varchar(max) not null
, is_deleted bit not null default 0
, post_id int not null
, upvotes int not null default 0
, date datetime not null default getutcdate()
, constraint pkComments primary key nonclustered (comment_id)
, constraint fkCommentsPostId
foreign key (post_id)
references Posts(post_id)
);
create clustered index cdxComments on
Comments (is_deleted, post_id, upvotes, date, comment_id);
go
and here is my test data generation:
insert into Posts (content)
select 'Lorem Ipsum'
from master..spt_values;
insert into Posts (content, parent_id)
select 'Ipsum Lorem', post_id
from Posts p
cross apply (
select top(checksum(newid(), p.post_id) % 10) Number
from master..spt_values) as r
where parent_id is NULL
insert into Comments (body, is_deleted, post_id, upvotes, date)
select 'Sit Amet'
-- 5% deleted comments
, case when abs(checksum(newid(), p.post_id, r.Number)) % 100 > 95 then 1 else 0 end
, p.post_id
-- up to 10 upvotes
, abs(checksum(newid(), p.post_id, r.Number)) % 10
-- up to 1 year old posts
, dateadd(minute, -abs(checksum(newid(), p.post_id, r.Number) % 525600), getutcdate())
from Posts p
cross apply (
select top(abs(checksum(newid(), p.post_id)) % 10) Number
from master..spt_values) as r