tags:

views:

38

answers:

2

I have a table with 21 columns (all integers) id, c1 , c2 ... c20 and what to fetch ids for rows that have the same values in the columns ...

so row 1 will match row 10 if row 1 c1 = row 10 c1 and row 1 c2 = row 10 c2 ... and so on.

This is what the query looks like:

 select r1.id, r2.id 
 from tbl r1, tbl r2  
 where 1=1 and r1.c1=r2.c1 and r1.c2=r2.c2 and 
         ..... r1.c20=r2.c20 and not r1.id=r2.id 

I am currently using self join, but this is way too slow (20 secs for 10000 rows). I have setup indexes on the columns (?) I have ~ 1 million rows Thanks

+1  A: 

Have you tried a single index on all 20 columns?

Marcelo Cantos
+1  A: 

You may want to create an extra column, where you can store a hash of all the values in the row. Then simply index that column and filter the rows that match the hash of the 20 values you are searching for.

Daniel Vassallo
i will need to be able to do other queries (apologies for not stating this on the post) such as r1.c1-r2.c1 > min and r1.c1-r2.c1 < max andr1.c2-r2.c2 > min and r1.c2-r2.c2 < max ....for integer min and max = 1, 2, 3 ...I am not sure how this can be done with hashes?
paulN
@paulN: No, this won't help then. By the way, do you have a separate index on each field, as Marcelo suggested?
Daniel Vassallo
Thanks for that ... it is now much better and only taking 0.2 seconds for the query against ~1 million rows! I created single index for all 20 columns as suggested.
paulN