views:

66

answers:

3

Suppose I have three tables: user, group and xref, a table that gives them many-to-many RI.

I might want to see how groups each user belongs to:

select
    user.user_id,
    user.user_name,
    count(*) as group_count
from
    user
        inner join xref on user.user_id = xref.user_id
        inner join group on group.group_id = xref.group_id
group by user.user_id, user.user_name

Everything's OK so far. But, what if I want some extra information? I'm reporting, and I want to know if each user is a Developer or a Content Manager. Now, an anti-pattern emerges:

select
    user.user_id,
    user.user_name,
    count(*) as group_count,
    max( case group.group_name when 'Developers' then 'Y' else null end )
        as is_dev
    max( case group.group_name when 'Content Management' then 'Y' else null end )
        as is_cm
from
    user
        inner join xref on user.user_id = xref.user_id
        inner join group on group.group_id = xref.group_id
group by user.user_id, user.user_name

This works, and produces expected results, but it feels very wrong. What I want to ask Oracle is this:

"For each user, show me how many groups they're in. Also, for all group names per user, show me if 'Developers' is one of the values."

What I'm actually asking is this:

"For each user, show me how many groups they're in. Also, for all group names per user, show me the highest value produced by this case expression."

The reason that this is an anti-pattern is that I'm basically relying on the fact that Y happens to "bubble up" above null when evaluated with max(). If someone wanted to copy or augment this query, they could easily forget about the anti-pattern and accidentally change the return values to something that doesn't use the same unintuitive coincidence.

Basically, the query I wish I could write is this:

select
    user.user_id,
    user.user_name,
    count(*) as group_count,
    any(group.group_name, 'Developers', 'Y', null) as is_dev,
    any(group.group_name, 'Content Management', 'Y', null) as is_cm
from
    user
        inner join xref on user.user_id = xref.user_id
        inner join group on group.group_id = xref.group_id
group by user.user_id, user.user_name

I've been sifting around for options, and it seems like there are a few potentials:

  • first_value could work, but I can't figure out how to limit the corresponding partition window to the right rows.
  • Analytic functions with an over clause might work, but I do want to collapse the columns I'm grouping by, so it doesn't seem to be a perfect fit.
  • Infuriatingly, there seems to be an any function documented here, but it only exists in a mysterious dialect called the Oracle OLAP DML, which I don't think I can access using only SQL on 10g. But, it seems to do exactly what I want.

That's all I got. Any ideas?

I recognize that there are two very simple ideas, "Do it in code" or "Do it in PL/SQL," but that's cheating. :-)

+3  A: 

I'd switch from MAX to SUM (with 1 rather than Y) so you are saying "Count the number of groups this person is in where the group name is Developers".

Then the pattern is similar to a "count the number of sales where the purchase value was more than $30".

You can, if desired, then add another expression to say "If the count is greater than zero then 'yes' this person is a developer". Very explicit and probably unnecessary though.

Gary
Upvoted because `SUM()` seems like less of a workaround. But, I am holding out on accepting any answers for now. Thanks!
Max
+2  A: 
SELECT  user.user_id,
        user.user_name,
        COUNT(*) group_count,
        COUNT(DISTINCT DECODE(group_name, 'Developers', 'Y', NULL)) AS is_developer
        COUNT(DISTINCT DECODE(group_name, 'Content Management', 'Y', NULL)) AS is_content_manager
FROM    the_query

As for the ANY, it's a predicate similar to IN, not a function:

SELECT  *
FROM    dual
WHERE   'baz' = ANY('foo', 'bar', 'baz')
Quassnoi
A: 

I prefer Gary's answer, but if you want to stick with a boolean return you could make the ordering more explicit by returning 'N' instead of null.

select
    user.user_id,
    user.user_name,
    count(*) as group_count,
    max( case group.group_name when 'Developers' then 'Y' else 'N' end )
        as is_dev
    max( case group.group_name when 'Content Management' then 'Y' else 'N' end )
        as is_cm
from
    user
        inner join xref on user.user_id = xref.user_id
        inner join group on group.group_id = xref.group_id
group by user.user_id, user.user_name

(+1 for nicely written question)

Nick Pierpoint
Thanks for the +1 and answer. Unfortunately, the 'N' falls victim to the same problem I mentioned in my question, which is that you're relying on the fact that 'N' just happens to sort higher than 'Y'. For example, if instead of 'Y' and 'N', you were using 'Always' and 'Never', then every cell would be 'Never', because `max` will always pick it up. It's an unintuitive "gotcha" which, I believe, is part of what makes this a true anti-pattern.
Max
As I said, I prefer the counting, but I think you're pretty safe using Y and N as a flag.
Nick Pierpoint