tags:

views:

47

answers:

3

I have the following query:

SELECT m.*, COUNT(c.chapter_nr) as chapters, MAX(c.chapter_nr) as latest_chapter
FROM comic_series AS m
JOIN chapters as c on c.comic_id = m.id
WHERE m.title_en = 'test'

This allows to me to find the chapters from a comic book by just giving the name of the comic. The query selects all the columns from the comic_series table and adds the chapters (number of chapters for that comic) and latest_chapter (newest chapter) columns.

Now, the problem that's bugging me for quite a while:

I have a third column which contains the comic IDs linked to categories IDs (and the category details are in a fourth table but this is irrelative here), and I'd like to select the comic books and their chapters by searching for category IDs.

This is what I've come up with so far:

SELECT `m`.*, COUNT(c.chapter_nr) as chapters, MAX(c.chapter_nr) as latest_chapter
FROM `comic_series` AS `m`
JOIN `comic_categories` AS `mc` ON mc.comic_id = m.id 
JOIN chapters as c on c.comic_id = m.id
where mc.category_id = 5
GROUP BY `m`.`id`

And this returns the correct COUNT(c.chapter_nr) but when I add more category IDs it returns an incorrect amount:

SELECT `m`.*, COUNT(c.chapter_nr) as chapters, MAX(c.chapter_nr) as latest_chapter
FROM `comic_series` AS `m`
JOIN `comic_categories` AS `mc` ON mc.comic_id = m.id 
JOIN `chapters` as c on c.comic_id = m.id
where mc.category_id = 5 OR mc.category_id = 1
GROUP BY `m`.`id`

The query above should return 1 for COUNT(c.chapter_nr) but it returns 2. This could maybe be because there are 2 records in the comic_categories table for that comic and only 1 record in the chapters table.

+1  A: 

With your query, you're selecting all comic books that have a category_id of 5 AND 1. As this is not possible here, you're getting no results. Change it to an OR, or better, to an IN clause:

SELECT `m`.*, COUNT(c.chapter_nr) as chapters, MAX(c.chapter_nr) as latest_chapter
FROM `comic_series` AS `m`
JOIN `comic_categories` AS `mc` ON mc.comic_id = m.id 
JOIN chapters as c on c.comic_id = m.id
where mc.category_id IN (1, 5) -- <<<<< HERE
GROUP BY `m`.`id`

EDIT (after the revised question):

By using the JOIN, your are exploding the result. I assume, that the comic is in multiple categories, therefore you're getting a result for every category.

Try the statement without grouping to get a grip on what is actually used before summarizing:

SELECT `m`.*
FROM `comic_series` AS `m`
JOIN `comic_categories` AS `mc` ON mc.comic_id = m.id
JOIN chapters as c on c.comic_id = m.id
where mc.category_id IN (1, 5);

This will give you two rows, but with very similar results (most probably the exact same two rows).

You can work around this by using subselects, as in

SELECT `m`.*, COUNT(c.chapter_nr) as chapters, MAX(c.chapter_nr) as latest_chapter
FROM `comic_series` AS `m`
JOIN chapters as c on c.comic_id = m.id
where m.id IN (SELECT comic_id FROM comic_categories WHERE category_id IN (1, 5))
GROUP BY `m`.`id`;

This will return what you're expecting. This has one pitfall though: Subselect paired with the IN syntax can be quite slow once the database has grown significantly. So make sure you keep this in memory as soon as your site visited frequently!

Cassy
OR is what I initially used, but this returns an incorrect COUNT(c.chapter_nr). I've corrected my first post.
EarthMind
Kind thanks for your revision. This will be useful for benchmarking against the COUNT(DISTINCT ) alternative.
EarthMind
+1  A: 

You should use OR instead of AND:

SELECT `m`.*, COUNT(c.chapter_nr) as chapters, MAX(c.chapter_nr) as latest_chapter
FROM `comic_series` AS `m`
JOIN `comic_categories` AS `mc` ON mc.comic_id = m.id 
JOIN chapters as c on c.comic_id = m.id
where mc.category_id IN (1, 5)
GROUP BY `m`.`id`

Edit: The subquery below will solve the incorrect counts. They're caused when a comic_series is in more than one category. The join will then result in duplicate rows. The subquery doesn't have this problem.

SELECT m.*, COUNT(c.chapter_nr) as chapters, MAX(c.chapter_nr) as latest_chapter
FROM comic_series AS m
JOIN chapters as c on c.comic_id = m.id
WHERE m.id IN 
(
  SELECT comic_id 
  FROM comic_categories 
  WHERE category_id IN (1, 5)
)
Sander Rijken
Your second example works (except for change `SELECT id` into `SELECT comic_id` but I was trying to avoid subqueries.
EarthMind
I'm not sure what's faster on mysql. Gathering doubles first, and performing a distinct operation on that might be more costly than a subquery?
Sander Rijken
I've have no idea myself. I was just thinking the logical. Since my tables are going to have a lot of records I will benchmark your query against the one I'm using now. Thank you for your help.
EarthMind
Good idea to do benchmarking, that's always better than guessing :-)
Sander Rijken
+1  A: 

When you join to the categories table, you get one row per matching category. If the comic is in two categories, you will get duplicate rows for the count.

Try using COUNT(DISTINCT c.chapter_nr) instead:

SELECT `m`.*, COUNT(DISTINCT c.chapter_nr) as chapters, MAX(c.chapter_nr) as latest_chapter
FROM `comic_series` AS `m`
JOIN `comic_categories` AS `mc` ON mc.comic_id = m.id 
JOIN `chapters` as c on c.comic_id = m.id
where mc.category_id = 5 OR mc.category_id = 1
GROUP BY `m`.`id`
Whatsit
Perfect, that is what I was looking for! I had no idea I could use distinct with count(). I've tried `SELECT DISTINCT` before but it didn't work
EarthMind