views:

189

answers:

2

I have a query to get duplicate data with some extra condition but I feel that it is not fast enough. Any solution to make this query faster?

v_listing contains big information

SELECT DISTINCT  code, name, comm, address, area 
FROM v_listing t1
WHERE EXISTS (SELECT NULL
                FROM v_listing t2
             WHERE t1.comm = t2.comm
             AND t1.address = t2.address
             AND t1.area = t2.area
             AND (t1.code > t2.code OR t1.code < t2.code))
ORDER BY comm, address, area
A: 

Well this one change alone should help a lot:

SELECT DISTINCT code, name, comm, address, area
FROM v_listing t1
WHERE EXISTS ( SELECT NULL
        FROM v_listing t2
            WHERE t1.comm = t2.comm
            AND t1.address = t2.address
            AND t1.area = t2.area
            AND t1.code <> t2.code)
ORDER BY comm, address, area

Alternatively, you could do it like this:

SELECT comm, address, area, MIN(code), MAX(code), MIN(name), COUNT(*)
FROM v_listing t1
GROUP BY comm, address, area
HAVING COUNT(*) > 2
ORDER BY comm, address, area
RBarryYoung
I'd advise against the second way. `min(code)` and `min(name)` most likely don't always go together on the same row.
Eric
I never implied that they were, the original question was somewhat unspecific on it's requirements.
RBarryYoung
+2  A: 

An exists clause does a semi-join, which isn't the most optimal way to compare two very large tables. In this case, it's one table, but the point stands. What you want to do is an inner join:

SELECT DISTINCT  
    t1.code, 
    t1.name, 
    t1.comm, 
    t1.address, 
    t1.area 
FROM 
    v_listing t1
    inner join v_listing t2 on
        t1.comm = t2.comm
        AND t1.address = t2.address
        AND t1.area = t2.area
        AND t1.code <> t2.code
ORDER BY t1.comm, t1.address, t1.area

Also ensure that you have indexes on all of the join columns. That will speed things up tremendously, as well.

Eric