tags:

views:

1920

answers:

4

Let's say I have a Product, Category, and Product_To_Category table. A Product can be in multiple categories.

    Product                     Category        Product_to_category
    ID   |   NAME               ID  | Name      Prod_id | Cat_id
    =====================       ============    ===================
        1| Rose                    1| Flowers          1| 1
        2| Chocolate Bar           2| Food             2| 2
        3| Chocolate Flower                            3| 1
                                                       3| 2

I would like an SQL query which gives me a result such as

    ProductName      | Category_1 | Category_2 | Category_3
    =======================================================
    Rose             | Flowers    |            |
    Chocolate Flower | Flowers    | Food       |

etc.

The best way I've been able to get this is to union a bunch of queries together; one query for every expected number of categories for a given product.

select p.name, cat1.name, cat2.name
from
  product p, 
  (select * from category c, producttocategory pc where pc.category_id = c.id) cat1,
  (select * from category c, producttocategory pc where pc.category_id = c.id) cat2
where p.id = cat1.id 
  and p.id = cat2.id
  and cat1.id != cat2.id
union all
select p.name, cat1.name, null
from
  product p, 
  (select * from category c, producttocategory pc where pc.category_id = c.id) cat1
where p.id = cat1.id 
  and not exists (select 1 from producttocategory pc where pc.product_id = p.id and pc.category_id != cat1.id)

There are several problems with this.

  • First, I have to repeat this union for each expected category; if a product can be in 8 categories I'd need 8 queries.
  • Second, the categories are not uniformly put into the same columns. For example, sometimes a product might have 'Food, Flowers' and another time 'Flowers, Food'.

Does anyone know of a better way to do this? Also, does this technique have a technical name?

+2  A: 

I don't know what RDBMS you're using, but in MySQL you can use GROUP_CONCAT:

SELECT
  p.name,
  GROUP_CONCAT(c.name SEPARATOR ', ') AS categories
FROM
  product p
  JOIN product_to_category pc ON p.id = pc.product_id
  JOIN category c ON c.id = pc.category_id
ORDER BY
  p.name,
  c.name
Seb
I think you also need to add GROUP BY, right?
meleyal
Absolutely! Forgot about that, thanks!
Seb
+1  A: 

You can't create these results with a strict SQL query. What you're trying to produce is called a pivot table. Many reporting tools support this sort of behavior, where you would select your product and category, then turn the category into the pivot column.

I believe SQL Server Analysis Services supports functionality like this, too, but I don't have any experience with SSAS.

Adam Robinson
+1  A: 
SELECT p.name, cat_food.name, cat_flowers.name
FROM
  product p
  left outer join  Product_to_category pc_food 
    on p.id = pc_food.Prod_id
  left outer join Category cat_food
    on pc_food.Cat_id = cat_food.id
    AND cat_food.name = 'Food'
  left outer join  Product_to_category pc_flowers
    on p.id = pc_flowers.Prod_id
  left outer join Category cat_flowers
    on pc_flowers.Cat_id = cat_flowers.id
    AND cat_flowers.Name = 'Flowers'

It only works if you know the number of possible categories, to put them into columns. That's how (standard) SQL works, the number of columns is not dynamic.

Stefan Steinegger
A: 

Seb's answer put me onto the right track for a workaround. I am using Oracle and it has functions which emulate MYSQL's group_concat. Here is an example. This does not generate columns, and thus isn't as good as a pure SQL solution, but it is suitable for my current purposes.

with data as
( 
  select 
    pc.id cat,
    p.id prod, 
    row_number() over( partition by p.id order by pc.id) rn,
    count(*) over (partition by p.id) cnt
  from product_to_category pc, product p
  where pc.product_id = p.id
)
select prod, ltrim(sys_connect_by_path(cat, ','), ',') cats
  from data
 where rn = cnt
 start with rn = 1 connect by prior prod = prod and prior rn = rn - 1
 order by prod

This generates data such as

PROD | CATS
===========
284  |   12
285  |   12
286  | 9,12

I can edit the ltrim(sys_connect_by_path()) column as needed to generate whatever data I need.

Mr. Shiny and New