ansaurus

Question

Answer 1

A:

Well this one change alone should help a lot:

SELECT DISTINCT code, name, comm, address, area
FROM v_listing t1
WHERE EXISTS ( SELECT NULL
        FROM v_listing t2
            WHERE t1.comm = t2.comm
            AND t1.address = t2.address
            AND t1.area = t2.area
            AND t1.code <> t2.code)
ORDER BY comm, address, area

Alternatively, you could do it like this:

SELECT comm, address, area, MIN(code), MAX(code), MIN(name), COUNT(*)
FROM v_listing t1
GROUP BY comm, address, area
HAVING COUNT(*) > 2
ORDER BY comm, address, area

RBarryYoung 2009-08-17 05:04:14

I'd advise against the second way. `min(code)` and `min(name)` most likely don't always go together on the same row.

Eric 2009-08-17 05:10:56

I never implied that they were, the original question was somewhat unspecific on it's requirements.

RBarryYoung 2009-08-17 12:09:11

Answer 2

+2 A:

An exists clause does a semi-join, which isn't the most optimal way to compare two very large tables. In this case, it's one table, but the point stands. What you want to do is an inner join:

SELECT DISTINCT  
    t1.code, 
    t1.name, 
    t1.comm, 
    t1.address, 
    t1.area 
FROM 
    v_listing t1
    inner join v_listing t2 on
        t1.comm = t2.comm
        AND t1.address = t2.address
        AND t1.area = t2.area
        AND t1.code <> t2.code
ORDER BY t1.comm, t1.address, t1.area

Also ensure that you have indexes on all of the join columns. That will speed things up tremendously, as well.

Eric 2009-08-17 05:09:37

ansaurus

tags:

views:

answers:

Faster way to find duplicate SQL query

related questions