I need to get summary data from many many rows. The summary fields are counts of how many entries have each value for different fields. For example, a table with people's age, city, job, etc, the summary data would include fields like "countManager", "countCodeMonkey" for each job, and then "countChicago", "countNewYork" etc for cities.
I know that a simple way of getting everything is:
select count(*) from table
group by age, city, job
But this is vertical counting - a different row for each value I need. Instead of rows with the counts, I need fields, because I have other fields to group by, for example state. So I want my results to look like this:
| State | countManager | countMonkey |
| IL | 3 | 25 |
| NY | 5 | 40 |
I'm looking at two ways to do this. We already have one implemented, and it takes 20 minutes to execute. I'm wondering if a different way would be faster.
The current way looks like this:
create view managers as
select state, count(*) as theCount from table
where job = 'Manager'
group by state;
create view monkeys as
select state, count(*) as theCount from table
where job = 'Monkey'
group by state;
select managers.theCount as managers, monkeys.theCount as monkeys
from managers left join monkeys
on managers.state = monkeys.state;
In the real case, there's about 20 more views, and hence 20 more joins.
I'm considering the following horizontal count method instead:
select state,
sum(case when job='Manager' then 1 else 0 end) as managers,
sum(case when job='Monkey' then 1 else 0 end) as monkeys
from table
group by state;
This eliminates the joins. But I have no clue about the performance of the "sum case" statement. Will this be any faster, about the same, or much slower? Does the engine have to go through the rows multiple times, once for each such counted field? Or is it smart enough to count all the fields in one run through, checking each value and incrementing the appropriate sum?
I may spend part of a day writing a script to generate a giant amount of garbage data to test this, but I'd still like to know how the db engine behaves, which I couldn't find elsewhere online.