tags:

views:

82

answers:

2

Hi I have the following table T:

id    1   2    3    4
col   a   b    a    c

I want to do a select that returns the id,col when group by(col) having count(col)>1

One way of doing it is

SELECT id,col FROM T 
   WHERE col IN (SELECT col FROM T GROUP BY(col) HAVING COUNT(col)>1);

The intern select (from the right) returns 'a' and main one (left) will return 1,a and 3,a

The problem is that the where in statement seems to be extremely slow. In my real case, the results from the internal select has many 'col's, something about 70000 and it's taking hours.

Right now it's much faster to do the internal select and the main select getting all ids and upcs and do the intersection locally. MySQL should be able to handle this kind of query efficiently.

Can I substitute the where in for a join or something faster?

Thanks

+3  A: 

You could try if using an INNER JOIN speeds things up

  • Make sure you have an index on col
  • A covering index on col, id might gain you even better performance

SQL Statement

SELECT  T.id, T.col
FROM    T
        INNER JOIN (
          SELECT   col
          FROM     T
          GROUP BY col
          HAVING COUNT(*) > 1
        ) tcol ON tcol.col = T.col
Lieven
Thanks. That worked fine. I inserted before tcol the keyword 'as'. I don't even have an index on col, but it worked very fast.
duduklein
+1  A: 
SELECT  id, col
FROM    t t1
WHERE   EXISTS
        (
        SELECT  NULL
        FROM    t t2
        WHERE   t2.col = t1.col
                AND t2.id <> t1.id
        )

Make sure you have an index on (col) (in InnoDB) or (col, id) (in MyISAM)

The inner query will stop evaluating as soon as it finds a first matching value. With an index, this will happen after a single index seek with at most two index scans.

Quassnoi
+1. Perhaps you could add some explanation as to *why* this is *much* faster than OP's or my solution. I assume it has to do with not having to scan an entire index and to be able to stop when a match has been found.
Lieven