views:

159

answers:

3

Imagine I have a db table of Customers containing {id,username,firstname,lastname}

If I want to find how many instances there are of different firstnames I can do:

select firstname, count(*) from Customers group by 2 order by 1;

   firstname | count(*)
   ====================
   bob       |   1
   jeff      |   2
   adam      |   5

How do I count the number of firstnames that occur more than once? In pseudo-sql it would be something like:

select
    COUNT(
        firstname,
        count(*) as num_occurrences
    )
from 
    Customers 
group by 2 
having num_occurrences > 1;
+1  A: 

This would do it:

select count(username)
  from (select username
          from Customers
         group by username
        having count(*) > 1);
Asaph
NB: question was confused about username vs firstname - this answer used 'username' where the revised question goes with 'firstname'.
Jonathan Leffler
+3  A: 

You have the right idea:

SELECT COUNT(*)
FROM (
  SELECT firstname
  FROM Customers
  GROUP BY firstname
  HAVING COUNT(*) >= 2
)

The subquery counts the first names that have occurred more than once. Then you count all of those first names. The HAVING clause allows you to filter by aggregates. It's like a WHERE clause, except you can use aggregate functions.

Welbog
That's overcomplicated - IniTech's answer is better.
OMG Ponies
I prefer the idea of this solution because the database is only returning what I'm interested in. Unfortunately this doesn't seem to work in Informix. :(
Mr Gravity
OMG Ponies
This will work in IDS 11.50; it might work in IDS 11.10; it will not work in earlier versions of IDS - the support for 'sub-queries in the FROM clause' was added quite recently.
Jonathan Leffler
@rexem: @IniTech's answer is not answering the question.
Jonathan Leffler
@Jonathan: This answer isn't referencing the correct column either, and my subquery comment stands (support aside).
OMG Ponies
@rexem: agreed that the question is confused about firstname vs username.
Jonathan Leffler
+1  A: 

There is no need for a subquery.

Try:

SELECT firstname, COUNT(*)
  FROM Customers
 GROUP BY firstname
HAVING COUNT(*) > 1
 ORDER BY firstname

Or, order by the most represented name:

SELECT firstname, COUNT(*) AS custcount
  FROM Customers
 GROUP BY firstname
HAVING COUNT(*) > 1
 ORDER BY custcount DESC;
IniTech
Should be referencing `username`, not `firstname`. It also isn't providing the requested answer - the count of the number of usernames which have more than one row. If an INTO TEMP followed by SELECT from the temp table doesn't count as a sub-query, you can do that - but otherwise...
Jonathan Leffler
@Jonathon: The output isn't going to make sense without showing the username the count applies to.
OMG Ponies
@rexem: the question asks for the number of customers with more than one entry in the table with the same username. It does not ask for the usernames, per se; it asks for the number of such usernames. It is a classic aggregate-of-aggregates query, and the sub-query-in-the-FROM-clause is the best way to answer it.
Jonathan Leffler
@IniTech: looking at the question again, it seems to be a bit confused between username and firstname. The sample query selects on firstname, but the putative query result lists username.
Jonathan Leffler