ansaurus

Question

Showing all duplicates, side by side, in MySQL

Answer 1

A:

Would this work?

SELECT event_date, user
FROM eventlog
GROUP BY event_date, user
HAVING COUNT(*) > 1
ORDER BY event_date, user

What's throwing me off is the COUNT(user) clause you have.

David Andres 2009-09-08 04:52:58

I thought that I had to have something in that `COUNT()` to specify which column had the duplicate data (which one was duplicated in a bad way?), anyways, testing now...

Anthony 2009-09-08 05:49:03

Rats, same results. Still just getting one set of the duplicates, not both. I know it's some issue with the GROUP BY

Anthony 2009-09-08 05:50:52

Is it possible that your date field includes a timestamp value (e.g., 4:00 PM)? This may exclude what would otherwise look like a pair.

David Andres 2009-09-08 06:18:47

If it does, it's not showing up in phpmyadmin, which is what I'm using to do this. I'll try it again using a DATE() function to be sure.

Anthony 2009-09-08 06:29:21

No luck. Is it possible that the COUNT is only returning the members of the set that,when counted, are higher than 1 in that count? (as opposed to all members of the set where the count is higher than 1)

Anthony 2009-09-08 06:41:06

What do you see when you take away the HAVING clause and add COUNT(*) to the SELECT list?

David Andres 2009-09-08 06:45:54

I get all of my none dupes with a 1 in the count col, and one of my two dupes with a 2 in the count col. That explains why it's not returning both. How annoying.

Anthony 2009-09-08 06:54:17

Ok, either one or both of the columns differ in unexpected ways: (1) the event_date field may contain timestamps (2) the user field may contain trailing spaces in some cases

David Andres 2009-09-08 06:57:15

This would have to be true for each and every instance of where it is doing this, though. It really seems more like it is grouping the duplicates together, so to speak. Like I said. I'm looking at it in phpmyadmin. If either the date field or the user field were not matching, then doing a COUNT(user) or a COUNT(event_date) wouldn't produce the same results.

Anthony 2009-09-08 07:14:34

Very odd. Try SELECT DATE(event_date), RTRIM(user) and GROUP BY the same.

David Andres 2009-09-08 07:19:06

Check out my newer question for where I've decided to go with this:http://stackoverflow.com/questions/1392554/mysql-delete-all-results-of-sub-query

Anthony 2009-09-08 07:49:54

Answer 2

A:

You can list all the field values of the duplicates with GROUP_CONCAT function, but you still get one row for each set.

Pomyk 2009-09-08 06:45:49

Answer 3

A:

I think this would work (untested)

SELECT  *
FROM    eventlog e1
WHERE   1 <
(
    SELECT  COUNT(*)
    FROM    eventlog e2
    WHERE   e1.event_date = e2.event_date
    AND     e1.user = e2.user
)
-- AND [maybe an additionnal constraint to find the bad duplicate]
ORDER BY event_date, user;
;

Bryan Menard 2009-09-08 08:08:47

Answer 4

A:

In scenarios like this, you need a primary key in the table to mark or delete duplicate records. Seems like you don't have one. Add one and you might see options that were not visible before:

UPDATE eventlog SET has_atleast_one_duplicate WHERE primary_key IN 
(
SELECT primary_key
FROM eventlog
GROUP BY event_date, user
HAVING COUNT(*) > 1
)

I understand that the answer is incomplete but see if you can find a way to proceed further.

Salman A 2009-09-09 05:37:21

ansaurus

tags:

views:

answers:

Showing all duplicates, side by side, in MySQL

related questions