tags:

views:

1107

answers:

5

I have two columns in my Items table 'Category', 'Category2', the two columns contain, essentially the same information. If I had designed the database I would have created a separate table for categories and added items to categories based based off of that table, unfortunately I didn't create the database and I can't change it now but I think there is still a way to do what I want to do.

An example of the table is shown below

Category             Category2
------------------   -----------------
truck                full size - pickup
full size - pickup   truck
Sedan                Import - Sedan
Convertible          Domestic - Coupe

I want to run a query to count the total number of trucks, sedans, full size - pickup, etc. I tried the below query but it grouped the two columns separately

SELECT Category, Count(*) as Count
FROM Items
GROUP BY Category, Category2
A: 

im sure there's a better way to do it, but here ya go

declare @group1 (Category1, Count int)
declare @group2 (Category2, Count int)

insert into @group1 (Category1, Count1)
select Category1, count(Category1)
from Table
group by Category1

insert into @group2 (Category2, Count2)
select Category2, count(Category2)
from Table
group by Category2

select 
coalesce(Category1, Category2) as Category,
coalesce(Count1,0) + coalesce(Count2,0) as CountAll
from @group1 a
    full outer join @group2 b
     on a.Category1=b.Category2
DForck42
suppose 'truck' is in the first column twice and the second column 20 times. This answer shows 'truck's count as 2, instead of 22.
David B
yup, you're right. i fixed it.
DForck42
+8  A: 

Just dump both categories into a single column before grouping.

SELECT Category, Count(*) as TheCount
FROM
(
  SELECT Category1 as Category
  FROM Items
  UNION ALL
  SELECT Category2
  FROM Items
) sub
GROUP BY Category
David B
Make sure to use union all as this shows, union would give the wrong answer
HLGEM
+3  A: 

Imagine that a row with "category, category2" can be transformed to two rows (one with "category", one with "category2") to get what you want. You'd do that like this:

SELECT items.category /* , other columns... */
FROM items
UNION ALL
SELECT items.category2 /* , other columns... */
FROM items

So all you then need to do is aggregate across these:

SELECT category, count(*) FROM (
    SELECT items.category FROM items
    UNION ALL
    SELECT items.category2 FROM items
    ) expanded
GROUP BY category

You can also do the aggregate by stages like this if your database supports it:

with subcounts as (
  select items.category, items.category2, count(*) as subcount
  from items
  group by category, category2)
select category, sum(subagg) as finalcount from (
  select subcounts.category, sum(subcount) as subagg from subcounts group by category
  union all
  select subcounts.category2, sum(subcount) as subagg from subcounts group by category2
) combination
group by category

This will limit to just one scan of the main items table, good if you only have a small number of categories. You can emulate the same thing with temp tables in databases that don't support "WITH..."

EDIT:

I was sure there had to be another way to do it without scanning Items twice, and there is. Well, this is the PostgreSQL version:

SELECT category, count(*) FROM (
  SELECT CASE selector WHEN 1 THEN category WHEN 2 THEN category2 END AS category
  FROM Items, generate_series(1,2) selector
) items_fixed GROUP BY category

The only postgresql-specific bit here is "generate_series(1,2)" which produces a "table" containing two rows-- one with "1" and one with "2". Which is IMHO one of the handiest features in postgresql. You can implement similar things in the like of SQL Server as well, of course. Or you could say "(select 1 as selector union all select 2)". Another alternative is "(values(1),(2)) series(selector)" although how much of that syntax is standard and how much is postgres-specific, I'm not sure. Both these approaches have an advantage of giving the planner an idea that there will only be two rows.

Cross-joining this series table items allows us generate two output rows for each row of item. You can even take that "items_fixed" subquery and make it a view -- which btw is the reverse of the process I tend to use to try and solve these kind of problems.

araqnid
+1 Interesting how the default name of a union comes from the first table. Although the WITH variant still results in two table scans.
Andomar
Which database did you try the WITH variant with? (ahem) I tested it with the beta PostgreSQL 8.4, where I understand it is basically implemented as a temporary table, so involves a seq scan of "Items" and two CTE scans, which I assume are seq scans of the temp result. This isn't necessarily a win; only if the number of category combinations is significantly smaller than the number of items; but that seems likely.
araqnid
Thanks!! Very Informative - gave the answer to David for answering a min earlier but thank you explaining it!
Patcouch22
@araqnid: I ran it on Sql Server 2005. I've been looking for a way to do queries like this in a single table scan for a while.
Andomar
Ironically, I tried both the union and the single-scan method in postgres with a larger dummy data set, and the union actually performs better for larger values; the database goes for a sort/groupaggregate with the single-scan, and a hashaggregate for the union. Just goes to show, you should always test, not guess.
araqnid
A: 

try

select category,sum(CategoryCount)
from(
select Category1 as category, count(Category1) as CategoryCount
from Table
group by Category1
union all
select Category2 as category, count(Category2) as CategoryCount
from Table
group by Category2) x
group by category
SQLMenace
Needs a groupby clause outside the subquery.
David B
A: 

try this

select type as 1,count(*)as count from table where category like '%full size - pickup%'

union

select type as 2,count(*) as count from table where category like '%truck%'

union

select type as 3,count(*) as count from table where category like '%sedan%'
and so on......

type 1 will be your full-size count type 2 your truck count and so on....

hope this helps

Eric
not very useful considering if/whenever there was a new category he'd have to edit the query for it to work properly.
DForck42
he can create a stored procedure with the category as the parameter.
Eric
And call it once for each category? A bit wasteful, no?
Aaron Alton
This answer WILL work even though it has many problems, so +1 to compensate the downvotes! :P
Andomar
So would a cursor.... :P
Aaron Alton