views:

582

answers:

4

I am learning SQL for a personal projects and seems that I don't quite get the COUNT function.

I have a "sample" table with this sample data:

  • NAME COLOR
  • Tom red
  • Tom blue
  • Jerry yellow
  • Keri yellow
  • Paul red
  • Bob yellow
  • Bob red
  • Mary green

What I am attempted to do is print out only those NAME values that have only one COLOR value which is yellow.

Here is the query I wrote but Bob is coming out, which is a mistake.

SELECT COUNT(NAME),NAME
FROM SAMPLE
WHERE (COLOR = 'yellow')
HAVING COUNT(*) = 1
GROUP BY NAME;

May someone tell me what I am doing incorrectly?

Thanks.

+3  A: 

It's because your where clause is limiting the result set before the having clause is checking.

Hence you are stripping out bob red so that the only bob left is the yellow one. And it will have a count of 1.

This one works for me (albeit in DB2 but since I tend to prefer standard SQL, it should work on any DBMS):

select count(a.name), a.name
from sample a,
     (select name from sample where color = 'yellow') b
where a.name = b.name
group by a.name
having count(a.name) = 1;

Yellow returns (no Bob):

--------
   NAME 
--------
1  Jerry
1  Keri

while red returns (no Tom or Bob):

-------
   NAME
-------
1  Paul

The way this works is as follows:

  • A subquery is run to get a list of all names that have the color yellow. They can also have other colors at this point. This restricts the names to Jerry, Keri and Bob.
  • Then the "real" query is run getting the list of all names but only when they match one of the names in the subquery (so limiting it to names that have yellow).
  • This is grouped by the name and we use the count aggregate function to combine rows with the same name and give us a count of the colors for each name.
  • Lastly we throw away those that have more than one color.

I'm assuming here that you won't have a row in the table with a duplicate name and color - in other words, you should have a primary key or other constraint on (name,color). If you did have duplicates, then the cross-join would result in more rows and you would have to use distinct or group by in the subquery).

paxdiablo
+4  A: 

Try this:

SELECT COUNT(NAME),NAME
FROM SAMPLE
GROUP BY NAME
HAVING COUNT(*) = 1 AND MAX(COLOR) = 'yellow';

As @paxdiablo said, you need to leave the rows in the group until after you do the group by, so the count will be accurate. Then you can test for 'yellow' in the HAVING clause.

Even though it may seem redundant to use MAX() like I did in the above example, it's good form because any expression in the HAVING clause should use group-oriented functions. HAVING restricts groups whereas WHERE restricts rows.

Bill Karwin
Will that work for red? It looks like it'll only work for yellow since that's the 'maximum' color.
paxdiablo
it will work but I think it would be a bit clearer if you COUNT(color) instead of name - just me though. It works because the COUNT(color) = 1 limits to names with only one color.
David
Another suitable query could beSELECT NAMEFROM SAMPLEGROUP BY NAMEHAVING COUNT(*)=1AND COUNT(CASE WHEN COLOR='YELLOW' THEN 1 END) = 1This allows for variations of the query (eg to pick out people who have yellow and another colour).
Gary
@paxdiablo: If the group has a `COUNT(*)` of only one row, then `MAX(color) = MIN(color)`, so it works for any color. The suggestions by @David and @Gary are also unnecessary. Either the group's sole color is 'yellow' or another color or else NULL. The condition `MAX(color) = 'yellow'` works in any of these cases.
Bill Karwin
A: 

Oh, so the order of the commands do matter aftereall.

@Paxdiablo: thanks for the crystal-clear explantion. I need to learn aobut subqueries.

@Bill: thanks for your help too. I really appreciate it and obviously I need to read and practice more.

different
... and learn how to use Stackoverflow ... you ought to just comment on the answers, not provide another one yourself ;)
David Aldridge
A: 

Another method using an analytic function:

SELECT NAME
FROM   (
       SELECT NAME,
              COLOR,
              COUNT(*) OVER () ROWS_PER_NAME
       FROM   SAMPLE )
WHERE  COLOR = 'yellow' AND
       ROWS_PER_NAME = 1

Also, if very few NAME's had the color yellow I would try:

SELECT NAME,
       COLOR
FROM   SAMPLE P
WHERE  COLOR = 'yellow' AND
       NOT EXISTS (
          SELECT null
          FROM   SAMPLE C
          WHERE  C.NAME = P.NAME
          AND    COLOR != 'yellow')
David Aldridge