tags:

views:

1039

answers:

3

The following query will display all Dewey Decimal numbers that have been duplicated in the "book" table:

SELECT dewey_number, 
 COUNT(dewey_number) AS NumOccurrences
FROM book
GROUP BY dewey_number
HAVING ( COUNT(dewey_number) > 1 )

However, what I'd like to do is have my query display the name of the authors associated with the duplicated entry (the "book" table and "author" table are connected by "author_id"). In other words, the query above would yield the following:

dewey_number | NumOccurrences
------------------------------
5000         | 2
9090         | 3

What I'd like the results to display is something similar to the following:

author_last_name | dewey_number | NumOccurrences
-------------------------------------------------
Smith            | 5000         | 2
Jones            | 5000         | 2
Jackson          | 9090         | 3
Johnson          | 9090         | 3
Jeffers          | 9090         | 3

Any help you can provide is greatly appreciated. And, in case it comes into play, I'm using a Postgresql DB.

UPDATE: Please note that "author_last_name" is not in the "book" table.

+1  A: 

You probably want this

SELECT dewey_number, author_last_name,
 COUNT(dewey_number) AS NumOccurrences
FROM book
GROUP BY dewey_number,author_last_name
HAVING ( COUNT(dewey_number) > 1 )
Kibbee
"author_last_name" is not in the "book" table. The "book" table and "author" table are connected by a FKed "author_id" in the "book" table.
Huuuze
+2  A: 

A nested query can do the job.

SELECT author_last_name, dewey_number, NumOccurrences
FROM author INNER JOIN
     ( SELECT author_id, dewey_number,  COUNT(dewey_number) AS NumOccurrences
        FROM book
        GROUP BY author_id, dewey_number
        HAVING ( COUNT(dewey_number) > 1 ) ) AS duplicates
ON author.id = duplicates.author_id

(I don't know if this is the fastest way to achieve what you want.)

Update: Here is my data

SELECT * FROM author;
 id | author_last_name 
----+------------------
  1 | Fowler
  2 | Knuth
  3 | Lang

SELECT * FROM book;
 id | author_id | dewey_number |         title          
----+-----------+--------------+------------------------
  1 |         1 |          600 | Refactoring
  2 |         1 |          600 | Refactoring
  3 |         1 |          600 | Analysis Patterns
  4 |         2 |          600 | TAOCP vol. 1
  5 |         2 |          600 | TAOCP vol. 1
  6 |         2 |          600 | TAOCP vol. 2
  7 |         3 |          500 | Algebra
  8 |         3 |          500 | Undergraduate Analysis
  9 |         1 |          600 | Refactoring
 10 |         2 |          500 | Concrete Mathematics
 11 |         2 |          500 | Concrete Mathematics
 12 |         2 |          500 | Concrete Mathematics

And here is the result of the above query:

 author_last_name | dewey_number | numoccurrences 
------------------+--------------+----------------
 Fowler           |          600 |              4
 Knuth            |          600 |              3
 Knuth            |          500 |              3
 Lang             |          500 |              2
Federico Ramponi
Ok, probably a nested query is overkill, but now I'm sure it works. Tony Andrews' comment on the author_id column applies here too.
Federico Ramponi
By the way, why are you searching for duplicates based on author's name and Dewey code? Duplicates (multiple copies of the same book, I guess) should be based also on the book's title...
Federico Ramponi
I updated the code, but it returns 0 results. As for the use case question, I simply created a dummy scenario -- I'm not actually working on detecting duplicate books.
Huuuze
"I updated the code, but it returns 0 results." I executed this code in psql with two tables like yours and it returned the expected data. Check your data. Let me work out the details, then I'll update my post.
Federico Ramponi
OK, I found part of the problem on my end -- a couple fields were missing DD numbers. Unfortunately, the query doesn't yield the desired result -- it only displays the name of a single author with books that have duplicated DD numbers.
Huuuze
Update your data so author_id=1 only has one "600" and author_id=2 only has one 600. Do you still get a result back for "600"?
Huuuze
Of course not. And there should not be any, otherwise your question is wrong. You can't expect a query to choose arbitrarily an author_id from those (many) having an entry with dewey=600 and have that author_id joined with table author.
Federico Ramponi
+2  A: 
SELECT dewey_number, author_last_name,
       COUNT(dewey_number) AS NumOccurrences
FROM book
JOIN author USING (author_id)
GROUP BY dewey_number,author_last_name
HAVING COUNT(dewey_number) > 1

If book.author_id can be null then change the join to:

LEFT OUTER JOIN author USING (author_id)

If the author_id column has a different name in each table then you can't use USING, use ON instead:

JOIN author ON author.id = book.author_id

or

LEFT OUTER JOIN author ON author.id = book.author_id
Tony Andrews
No luck on this one. It returns 0 results.
Huuuze
Do both tables have a column AUTHOR_ID as you stated? My USING syntax assumes that. If not change to "JOIN author ON author.xxx = book.yyy". If the query returns 0 rows (without error) that suggests that there the authors table is empty?
Tony Andrews
I think we're getting closer. I did neglect to mention that it's "author.id" and "book.author_id", so good call on USING vs ON. I updated accordingly, but Postgresql didn't like the bracketed "[LEFT JOIN]". After removing the brackets, it yields 0 results.
Huuuze
And, no, neither "id" field can be null.
Huuuze
Sorry, the brackets around LEFT OUTER were my way of saying that these keywords were optional
Tony Andrews