views:

108

answers:

4

In one statement I'm trying to group rows of one table by joining to another table. I want to only get grouped rows where their grouped result is not empty.

Ex. Items and Categories

SELECT Category.id
FROM Item, Category
WHERE Category.id = Item.categoryId
GROUP BY Category.id
HAVING COUNT(Item.id) > 0

The above query gives me the results that I want but this is slow, since it has to count all the rows grouped by Category.id.

What's a more effecient way?

I was trying to do a Group By LIMIT to only retrieve one row per group. But my attempts failed horribly. Any idea how I can do this?

Thanks

A: 

Try this:

SELECT  item.categoryid
FROM    Item
JOIN    Category
ON      Category.id = Item.categoryId
GROUP BY
        item.categoryid
HAVING  COUNT(*) > 0

This is similar to your original query, but won't do what you want.

If you want to select non-empty categories, do this:

SELECT  category.id
FROM    category
WHERE   id IN
        (
        SELECT  category_id
        FROM    item
        )

For this to work fast, create an index on item (category_id).

Quassnoi
I'm well aware of the various ways of selecting non empty categories. But I need something efficient. Potentially Item could have thousands of rows.
Ryan Badour
You could add the DISTINCT keyword to avoid all those category_id duplicates in the item table.
SorcyCat
@Ryan: what makes you think this would be inefficient?
Quassnoi
Well your still selecting every single row of Item, isn't there a way to only select one row for each category?@SorcyCat Does adding DISTINCT speed up the query?Thanks
Ryan Badour
@Ryan: if you create an index on `item (category_id)`, the engine will select only one item using that index and will return on the first match. `DISTINCT` is redundant here.
Quassnoi
Really? Sounds promising thank you. I'll test this in a few hours.
Ryan Badour
A: 

What about eliminating the Category table if you don't need it?

SELECT Item.categoryId 
FROM Item
GROUP BY Item.categoryId

I'm not sure you even need the HAVING clause since if there is no item in a category it won't create a group.

SorcyCat
This will return items without a record in category.
Quassnoi
Your original query didn't indicate a need for the whole category record, just the id already provided in the Item table. Take a look at Quassnoi's answer if you do.
SorcyCat
Oh it's needed, this is just an example to demonstrate what I'm trying to do.
Ryan Badour
Okay thanks guys, your solutions both worked. (Quassnoi and Donnie)
Ryan Badour
A: 

I think this is functionally equivalent (returns every category that has at least one item), and should be much faster.

SELECT 
  c.id
FROM 
  Category c
WHERE
  EXISTS (
    select 1 from Item i where i.categoryid = c.categoryID
  )
Donnie
This looks very close to what I was looking for. I was going to use EXISTS, but I couldn't remember the name thanks. If you did "LIMIT 1" at the end of that inner query would that help in anyway to speed it up? Because your still selecting all items.
Ryan Badour
This does not select all items. Exists returns as soon as a match is found, it does not find all matching rows.
Donnie
Are you sure? I thought LIMIT was the same way before. But then I read that LIMIT only filters the final result. How sure are you that EXISTS stops with one result? Either, way I'll test it in about an hour.
Ryan Badour
I'm positive. That's why `exists` can be such a powerful optimization. (Note that you still need good indexes to get really good performance)
Donnie
A: 

I think, and this is just my opinion, that the correct approach IS counting all the stuff. Maybe the problem is in another place.

This is what I use for counting and it works pretty fast, even with a lot of data.

SELECT categoryid, COUNT(*) FROM Item GROUP By categoryid

It will give you a hash with all the items by category. But it will NOT include empty categories.

Then, for retrieveng category information do like this:

SELECT category.* FROM category
INNER JOIN (SELECT categoryid, COUNT(*) AS n FROM Item GROUP By categoryid) AS item
ON category.id = item.categoryid
Erik Escobedo
By the way, this query will give you the exact number of the items per category. If you don't need this, you can skip it deleting the "COUNT(*) AS n" part.
Erik Escobedo