This is actually a pretty interesting question.
HA HA DISREGARD THIS, I SUCK
On edit: this answer works, but on MySQL it becomes tediously slow when the number of parent rows is as few as 100. However, see below for a performant fix.
Obviously, you can run this query once per post: select * from comments where id = $id limit 3
That creates a lot of overhead, as you end up doing one database query per post, the dreaded N+1 queries.
If you want to get all posts at once (or some subset with a where) the following will surprisingly work. It assumes that comments have a monotonically increasing id (as a datetime is not guaranteed to be unique), but allows for comment ids to be interleaved among posts.
Since an auto_increment id column is monotonically increasing, if comment has an id, you're all set.
First, create this view. In the view, I call post parent
and comment child
:
create view parent_top_3_children as
select a.*,
(select max(id) from child where parent_id = a.id) as maxid,
(select max(id) from child where id < maxid
and parent_id = a.id) as maxidm1,
(select max(id) from child where id < maxidm1
and parent_id = a.id) as maxidm2
from parent a;
maxidm1
is just "max id minus 1"; maxidm2
, "max id minus 2" -- that is, the second and third greatest child ids within a particular parent id.
Then join the view to whatever you need from the comment (I'll call that text
):
select a.*,
b.text as latest_comment,
c.text as second_latest_comment,
d.text as third_latest_comment
from parent_top_3_children a
left outer join child b on (b.id = a.maxid)
left outer join child c on (c.id = a.maxidm1)
left outer join child d on (c.id = a.maxidm2);
Naturally, you can add whatever where clause you want to that, to limit the posts: where a.category = 'foo'
or whatever.
Here's what my tables look like:
mysql> select * from parent;
+----+------+------+------+
| id | a | b | c |
+----+------+------+------+
| 1 | 1 | 1 | NULL |
| 2 | 2 | 2 | NULL |
| 3 | 3 | 3 | NULL |
+----+------+------+------+
3 rows in set (0.00 sec)
And a portion of child. Parent 1 has noo children:
mysql> select * from child;
+----+-----------+------+------+------+------+
| id | parent_id | a | b | c | d |
+----+-----------+------+------+------+------+
. . . .
| 18 | 3 | NULL | NULL | NULL | NULL |
| 19 | 2 | NULL | NULL | NULL | NULL |
| 20 | 2 | NULL | NULL | NULL | NULL |
| 21 | 3 | NULL | NULL | NULL | NULL |
| 22 | 2 | NULL | NULL | NULL | NULL |
| 23 | 2 | NULL | NULL | NULL | NULL |
| 24 | 3 | NULL | NULL | NULL | NULL |
| 25 | 2 | NULL | NULL | NULL | NULL |
+----+-----------+------+------+------+------+
24 rows in set (0.00 sec)
And the view gives us this:
mysql> select * from parent_top_3;
+----+------+------+------+-------+---------+---------+
| id | a | b | c | maxid | maxidm1 | maxidm2 |
+----+------+------+------+-------+---------+---------+
| 1 | 1 | 1 | NULL | NULL | NULL | NULL |
| 2 | 2 | 2 | NULL | 25 | 23 | 22 |
| 3 | 3 | 3 | NULL | 24 | 21 | 18 |
+----+------+------+------+-------+---------+---------+
3 rows in set (0.21 sec)
The explain plan for the view is only slightly hairy:
mysql> explain select * from parent_top_3;
+----+--------------------+------------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+------------+------+---------------+------+---------+------+------+-------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 3 | |
| 2 | DERIVED | a | ALL | NULL | NULL | NULL | NULL | 3 | |
| 5 | DEPENDENT SUBQUERY | child | ALL | PRIMARY | NULL | NULL | NULL | 24 | Using where |
| 4 | DEPENDENT SUBQUERY | child | ALL | PRIMARY | NULL | NULL | NULL | 24 | Using where |
| 3 | DEPENDENT SUBQUERY | child | ALL | NULL | NULL | NULL | NULL | 24 | Using where |
+----+--------------------+------------+------+---------------+------+---------+------+------+-------------+
However, if we add an index for parent_fks,it gets a better:
mysql> create index pid on child(parent_id);
mysql> explain select * from parent_top_3;
+----+--------------------+------------+------+---------------+------+---------+-----------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+------------+------+---------------+------+---------+-----------+------+-------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 3 | |
| 2 | DERIVED | a | ALL | NULL | NULL | NULL | NULL | 3 | |
| 5 | DEPENDENT SUBQUERY | child | ref | PRIMARY,pid | pid | 5 | util.a.id | 2 | Using where |
| 4 | DEPENDENT SUBQUERY | child | ref | PRIMARY,pid | pid | 5 | util.a.id | 2 | Using where |
| 3 | DEPENDENT SUBQUERY | child | ref | pid | pid | 5 | util.a.id | 2 | Using where |
+----+--------------------+------------+------+---------------+------+---------+-----------+------+-------------+
5 rows in set (0.04 sec)
As noted above, this begins to fall apart when the number of parent rows is few as 100, even if we index into parent using its primary key:
mysql> select * from parent_top_3 where id < 10;
+----+------+------+------+-------+---------+---------+
| id | a | b | c | maxid | maxidm1 | maxidm2 |
+----+------+------+------+-------+---------+---------+
| 1 | 1 | 1 | NULL | NULL | NULL | NULL |
| 2 | 2 | 2 | NULL | 25 | 23 | 22 |
| 3 | 3 | 3 | NULL | 24 | 21 | 18 |
| 4 | NULL | 1 | NULL | 65 | 64 | 63 |
| 5 | NULL | 2 | NULL | 73 | 72 | 71 |
| 6 | NULL | 3 | NULL | 113 | 112 | 111 |
| 7 | NULL | 1 | NULL | 209 | 208 | 207 |
| 8 | NULL | 2 | NULL | 401 | 400 | 399 |
| 9 | NULL | 3 | NULL | 785 | 784 | 783 |
+----+------+------+------+-------+---------+---------+
9 rows in set (1 min 3.11 sec)
(Note that I intentionally test on a slow machine, with data saved on a slow flash disk.)
Here's the explain, looking for exactly one id (and the first one, at that):
mysql> explain select * from parent_top_3 where id = 1;
+----+--------------------+------------+------+---------------+------+---------+-----------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+------------+------+---------------+------+---------+-----------+------+-------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 1000 | Using where |
| 2 | DERIVED | a | ALL | NULL | NULL | NULL | NULL | 1000 | |
| 5 | DEPENDENT SUBQUERY | child | ref | PRIMARY,pid | pid | 5 | util.a.id | 179 | Using where |
| 4 | DEPENDENT SUBQUERY | child | ref | PRIMARY,pid | pid | 5 | util.a.id | 179 | Using where |
| 3 | DEPENDENT SUBQUERY | child | ref | pid | pid | 5 | util.a.id | 179 | Using where |
+----+--------------------+------------+------+---------------+------+---------+-----------+------+-------------+
5 rows in set (56.01 sec)
Over 56 seconds for one row, even on my slow machine, is two orders of magnitude unacceptable.
So can we save this query? It works, it's just too slow.
Here's the explain plan for the modified query. It looks as bad or worse:
mysql> explain select * from parent_top_3a where id = 1;
+----+--------------------+------------+------+---------------+------+---------+-----------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+------------+------+---------------+------+---------+-----------+------+-------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 100 | Using where |
| 2 | DERIVED | <derived4> | ALL | NULL | NULL | NULL | NULL | 100 | |
| 4 | DERIVED | <derived6> | ALL | NULL | NULL | NULL | NULL | 100 | |
| 6 | DERIVED | a | ALL | NULL | NULL | NULL | NULL | 100 | |
| 7 | DEPENDENT SUBQUERY | child | ref | pid | pid | 5 | util.a.id | 179 | Using where |
| 5 | DEPENDENT SUBQUERY | child | ref | PRIMARY,pid | pid | 5 | a.id | 179 | Using where |
| 3 | DEPENDENT SUBQUERY | child | ref | PRIMARY,pid | pid | 5 | a.id | 179 | Using where |
+----+--------------------+------------+------+---------------+------+---------+-----------+------+-------------+
7 rows in set (0.05 sec)
But it completes three orders of magnitude faster, in 1/20th of a second!
How do we get to the much speedier parent_top_3a? We create three views, each one dependent on the previous one:
create view parent_top_1 as
select a.*,
(select max(id) from child where parent_id = a.id)
as maxid
from parent a;
create view parent_top_2 as
select a.*,
(select max(id) from child where parent_id = a.id and id < a.maxid)
as maxidm1
from parent_top_1 a;
create view parent_top_3a as
select a.*,
(select max(id) from child where parent_id = a.id and id < a.maxidm1)
as maxidm2
from parent_top_2 a;
Not only does this work much more quickly, it's legal on RDBMSes other than MySQL.
Let's increase the number of parent rows to 12800, the number of child rows to 1536 (most blog posts don't get comments, right? ;) )
mysql> select * from parent_top_3a where id >= 20 and id < 40;
+----+------+------+------+-------+---------+---------+
| id | a | b | c | maxid | maxidm1 | maxidm2 |
+----+------+------+------+-------+---------+---------+
| 39 | NULL | 2 | NULL | NULL | NULL | NULL |
| 38 | NULL | 1 | NULL | NULL | NULL | NULL |
| 37 | NULL | 3 | NULL | NULL | NULL | NULL |
| 36 | NULL | 2 | NULL | NULL | NULL | NULL |
| 35 | NULL | 1 | NULL | NULL | NULL | NULL |
| 34 | NULL | 3 | NULL | NULL | NULL | NULL |
| 33 | NULL | 2 | NULL | NULL | NULL | NULL |
| 32 | NULL | 1 | NULL | NULL | NULL | NULL |
| 31 | NULL | 3 | NULL | NULL | NULL | NULL |
| 30 | NULL | 2 | NULL | 1537 | 1536 | 1535 |
| 29 | NULL | 1 | NULL | 1529 | 1528 | 1527 |
| 28 | NULL | 3 | NULL | 1513 | 1512 | 1511 |
| 27 | NULL | 2 | NULL | 1505 | 1504 | 1503 |
| 26 | NULL | 1 | NULL | 1481 | 1480 | 1479 |
| 25 | NULL | 3 | NULL | 1457 | 1456 | 1455 |
| 24 | NULL | 2 | NULL | 1425 | 1424 | 1423 |
| 23 | NULL | 1 | NULL | 1377 | 1376 | 1375 |
| 22 | NULL | 3 | NULL | 1329 | 1328 | 1327 |
| 21 | NULL | 2 | NULL | 1281 | 1280 | 1279 |
| 20 | NULL | 1 | NULL | 1225 | 1224 | 1223 |
+----+------+------+------+-------+---------+---------+
20 rows in set (1.01 sec)
Note that these timings are for MyIsam tables; I'll leave it to someone else to do timings on Innodb.
But using Postgresql, on a similar but not identical data set, we get similar timings on where
predicates involving parent
's columns:
postgres=# select (select count(*) from parent) as parent_count, (select count(*)
from child) as child_count;
parent_count | child_count
--------------+-------------
12289 | 1536
postgres=# select * from parent_top_3a where id >= 20 and id < 40;
id | a | b | c | maxid | maxidm1 | maxidm2
----+---+----+---+-------+---------+---------
20 | | 18 | | 1464 | 1462 | 1461
21 | | 88 | | 1463 | 1460 | 1457
22 | | 72 | | 1488 | 1486 | 1485
23 | | 13 | | 1512 | 1510 | 1509
24 | | 49 | | 1560 | 1558 | 1557
25 | | 92 | | 1559 | 1556 | 1553
26 | | 45 | | 1584 | 1582 | 1581
27 | | 37 | | 1608 | 1606 | 1605
28 | | 96 | | 1607 | 1604 | 1601
29 | | 90 | | 1632 | 1630 | 1629
30 | | 53 | | 1631 | 1628 | 1625
31 | | 57 | | | |
32 | | 64 | | | |
33 | | 79 | | | |
34 | | 37 | | | |
35 | | 60 | | | |
36 | | 75 | | | |
37 | | 34 | | | |
38 | | 87 | | | |
39 | | 43 | | | |
(20 rows)
Time: 91.139 ms