views:

654

answers:

5

If I have a table like this:

pkey   age
----   ---
   1     8
   2     5
   3    12
   4    12
   5    22

I can "group by" to get a count of each age.

select age,count(*) n from tbl group by age;
age  n
---  -
  5  1
  8  1
 12  2
 22  1

What query can I use to group by age ranges?

  age  n
-----  -
 1-10  2
11-20  2
20+    1
+5  A: 

Try:

select to_char(floor(age/10) * 10) || '-' 
|| to_char(ceil(age/10) * 10 - 1)) as age, 
count(*) as n from tbl group by floor(age/10);
Matthew Flaschen
clever usage of floor/division!
Mark
A: 

add an age_range table and an age_range_id field to your table and group by that instead.

// excuse the DDL but you should get the idea

create table age_range(
age_range_id tinyint unsigned not null primary key,
name varchar(255) not null);

insert into age_range values 
(1, '18-24'),(2, '25-34'),(3, '35-44'),(4, '45-54'),(5, '55-64');

// again excuse the DML but you should get the idea

select
 count(*) as counter, p.age_range_id, ar.name
from
  person p
inner join age_range ar on p.age_range_id = ar.age_range_id
group by
  p.age_range_id, ar.name order by counter desc;

You can refine this idea if you like - add from_age to_age columns in the age_range table etc - but i'll leave that to you.

hope this helps :)

f00
Judging by the other responses performance and flexibility arent important criteria.The explain plans for all the dynamic queries listed would be horrendous and you'd have to amend code if your age ranges changed.Each to their own i guess :P
f00
1 full scan will allways be faster than 2 full scans. Also, people who ask for age range statistics probably have had the same ranges for last 20+ years and have no intentions to change that.
jva
I'm pretty sure the physical column will out perform a derived/calculated one. Infact it's probably an ideal candidate for a bitmap index.I'd still prefer to use a lookup table than to hardcode values into my applications. Adding a new age range say 14-16 yrs and i'm inserting a new row vs. raising a change request, spending time coding and testing the changes and releasing into prod.
f00
+1  A: 

Here is a solution which creates a "range" table in a sub-query and then uses this to partition the data from the main table:

SELECT DISTINCT descr
  , COUNT(*) OVER (PARTITION BY descr) n
FROM age_table INNER JOIN (
  select '1-10' descr, 1 rng_start, 10 rng_stop from dual
  union (
  select '11-20', 11, 20 from dual
  ) union (
  select '20+', 21, null from dual
)) ON age BETWEEN nvl(rng_start, age) AND nvl(rng_stop, age)
ORDER BY descr;
Dan
+5  A: 
SELECT CASE 
         WHEN age <= 10 THEN '1-10' 
         WHEN age <= 20 THEN '11-20' 
         ELSE '21+' 
       END AS age, 
       COUNT(*) AS n
FROM age
GROUP BY CASE 
           WHEN age <= 10 THEN '1-10' 
           WHEN age <= 20 THEN '11-20' 
           ELSE '21+' 
         END
Einstein
This should be the first and only answer to this question. Could use a little more formatting though.
jva
No, CASE statements use short circut evaluation
Einstein
How would short circut evaluation cause a problem in this query? Because the cases are ordered and use <= then the correct group is always picked. Isn't it?
Adrian
Adrian your correct, it was in reply to a previous comment that had since been removed.
Einstein
+1  A: 

If using Oracle 9i+, you might be able to use the NTILE analytic function:

WITH tiles AS (
  SELECT t.age,
         NTILE(3) OVER (ORDER BY t.age) AS tile
    FROM TABLE t)
  SELECT MIN(t.age) AS min_age,
         MAX(t.age) AS max_age,
         COUNT(t.tile) As n
    FROM tiles t
GROUP BY t.tile

The caveat to NTILE is that you can only specify the number of partitions, not the break points themselves. So you need to specify a number that is appropriate. IE: With 100 rows, NTILE(4) will allot 25 rows to each of the four buckets/partitions. You can not nest analytic functions, so you'd have to layer them using subqueries/subquery factoring to get desired granularity. Otherwise, use:

  SELECT CASE t.age
           WHEN BETWEEN 1 AND 10 THEN '1-10' 
           WHEN BETWEEN 11 AND 20 THEN '11-20' 
           ELSE '21+' 
         END AS age, 
         COUNT(*) AS n
    FROM TABLE t
GROUP BY CASE t.age
           WHEN BETWEEN 1 AND 10 THEN '1-10' 
           WHEN BETWEEN 11 AND 20 THEN '11-20' 
           ELSE '21+' 
         END
OMG Ponies