views:

490

answers:

8

I used the following query to find duplicates:

SELECT userID,
COUNT(userID) AS NumOccurrences
FROM userDepartments
GROUP BY userID
HAVING ( COUNT(userID) > 1 )

I then tried adding an inner join so I could see the user names that match, which are stored in a different table.

SELECT userDepartments.userID, users.firstname, users.lastname,
COUNT(userID) AS NumOccurrences
FROM userDepartments INNER JOIN users ON userDepartments.userID = users.userID
GROUP BY userID
HAVING ( COUNT(userID) > 1 )

But it gave me an error saying that users.firstname was not part of some aggregate function or something...

Does anyone know how I can get the count, only show users with more than 1 department, and also get the first and last name out of the other table so I can get a list of users names who have more than one department assigned?

EDIT: THIS IS THE QUERY THAT ENDED UP WORKING FOR ME...

SELECT     firstname, lastname
FROM         tbl_users
WHERE     (userID IN
                          (SELECT     userID
                            FROM          tbl_usersDepts
                            GROUP BY userID
                            HAVING      (COUNT(userID) > 1)))
+1  A: 

Group by all three: the userDepartments.userID, users.firstname, and users.lastname

jimmyorr
You beat me to it... +1
WildJoe
+2  A: 

The SQL engine doesn't know that you only have one username per userid, so you have to group by firstname and lastname as well as by user id.

SELECT userDepartments.userID, users.firstname, users.lastname,
COUNT(userID) AS NumOccurrences
FROM userDepartments INNER JOIN users ON userDepartments.userID = users.userID
GROUP BY userID, users.firstname, users.lastname
HAVING ( COUNT(userID) > 1 )

If you don't group by firstname and lastname, the engine doesn't know what it's supposed to do if it gets more than one value of firstname for a given userid. By telling it to group by all three values, it knows that if there is more than one row per userid, it should return all those rows. Even though this shouldn't happen, the engine isn't smart enough in this case to decide that on its own.

You could also do it this way:

SELECT users.userId, users.firstname, users.lastname, departments.NumOccurrences
FROM users INNER JOIN (
     SELECT userId, count(userId) as NumOccurrences 
     FROM userDepartments 
     GROUP BY userID 
     HAVING ( COUNT(userID) > 1 )
) departments ON departments.userID = users.userID
Eclipse
A: 

You need to include user.firstname and users.lastname in your GROUP BY clause - as they are not aggregate values (note that MySQL does actually support the syntax you've used in your query, but it is not standard).

BrynJ
A: 

If you do a "group by" then everything in the "select" portion either needs to be:

  1. Mentioned in the "group by" clause or

  2. The result of an aggregate function (like count())

clintp
+5  A: 

I would rearrange the query a little bit....

SELECT
    duplicates.NumOccurrences,
    duplicates.userID,
    users.firstname,
    users.lastname
FROM (
    SELECT
        userID,
        COUNT(userID) AS NumOccurrences
    FROM userDepartments
    GROUP BY userID
    HAVING COUNT(userID) > 1
) duplicates
INNER JOIN users ON duplicates.userID = users.userID
Justice
Beat me by a couple of seconds :)
Dave Costa
Ah - you beat me too..
Eclipse
Me as well, though I was able to go back and distinguish mine a bit.
Joel Coehoorn
Assuming a userID only has one firstname/lastname, can someone explain the advantage to doing this in a subquery and joining the result to users, vs. joining to users first and then grouping on all three fields?
jimmyorr
Because you cleanly separate the source data from the pretty-ification of the source data.
Justice
I don't buy that. If you need to add another field to the select, add it to the group by as well, nothing ugly about that. Once you've group'ed by userID, you're not paying much more to group by additional fields.
jimmyorr
Also worth noting, Oracle's CBO will likely merge your "duplicates" view and will result in the same cost/plan as my version, but in other dbs your pretty-ification may affect performance.
jimmyorr
It's easier to understand this version. Doubly so, when you have to transform the data at multiple levels and in multiple ways. That's when it's nice to be able to look at a chain, where each link is a bite-sized subquery.
Justice
"easier to understand" is a bit subjective. Objectively, on the other hand, this solution will read the entire userDepartments table into memory to create the "duplicates" in-line view, even if you can filter down the result set using a predicate on the users table. Again, this is w/o CBO's help.
jimmyorr
Think of it like Unix pipes. When the queries become large and complex and stop making sense, that's when it's really time to turn your queries into chains of bite-sized subqueries.
Justice
Your subquery isn't necessarily bite-sized. Without CBO's complex view merging, you're always reading in the entire userDepts table into an in-line view. What if userDepts is huge, and you could pare it down with predicates on another table (e.g. users). I only use Oracle, do other dbs merge views?
jimmyorr
Yes, if you need to pare down the "source" subquery, by all means do so ... the point is that since the given example is about tacking on additional info and "pretty-ify" the result-set, the "pretty-ification" aspect should be separated out. My answer separates the source query from the prettiness.
Justice
A: 

I would do it this way (in Oracle, just in case this doesn't work in your system):

SELECT users.userID, users.firstname, users.lastname, NumOccurrences
  FROM users
       INNER JOIN (
         SELECT userID, COUNT(userID) AS NumOccurrences
           FROM userDepartments
           GROUP BY userID
           HAVING ( COUNT(userID) > 1 )
       ) d
       ON d.userID = users.userID
Dave Costa
A: 

I see a lot good notes about adding your name fields to the group by. I think I'd do it like this, though:

SELECT Users.*, dups.NumOccurances, ud.DepartmentName
FROM Users
INNER JOIN
  (
    SELECT userID, COUNT(userID) AS NumOccurrences
    FROM userDepartments
    GROUP BY userID
    HAVING ( COUNT(userID) > 1 )
  ) dups ON dups.userID = Users.UserID
INNER JOIN userDepartments ud ON ud.UserID=Users.UserID
ORDER BY Users.LastName, Users.FirstName, Users.UserID

One reason for this approach is that it makes it easier to then go back and get any other information you might want.

Joel Coehoorn
Two others beat me by less than a minute. Deleting my duplicate.
Joel Coehoorn
Actually, I think I can add value after all. I'll have an update in a few minutes.
Joel Coehoorn
A: 

Add your user.Firstname and User.lastname to your group by clause

Eppz