views:

62

answers:

2

I have found my self in quite a pickle. I have tables of only one column (supression or inclusion lists) that are more or less varchar(25) but the thing is I won't have time to index them before using them in the main query and, depending how inportant it is, I won't know how many rows are in each table. The base table at the heart of all this is some 1.4 million rows and some 50 columns.

My assumptions are as follows:

IN shouln't be used in cases with a lot of values (rows) returned because it looks though the values serially, right? (IN on a subquery not passed the values directly)

Joins (INNER for inclusion and LEFT and checking for Nulls when supression) are the best for large sets of data (over 1k rows or so to mach to)

EXISTS has always concerned me because it seems to be doing a subquery for every row (all 1.4 million? Yikes.)

My gut say, if feasable get the count of the supression table and use either IN (for sub 1k rows) and INNER/LEFT Join (for suppression tables above 1k rows) Note, and field I will be supressing on will be index in the big base table but the supression table won't be. Thoughts?

Thanks in advance for any and all comments and/or advice.

A: 

It won't matter what technique you use, if there is no index on the table on which you apply a filter or join, the system will do a table scan.

RE: Exists

It is not necessarily the case that the system will do a subquery for all 1.4 million rows. SQL Server is smart enough to do the inner Exists query and then evaluate that against the main query. In some cases, Exists can perform equal to or better than a Join.

Thomas
+4  A: 

Assuming TSQL to mean SQL Server, have you seen this link regarding a comparison of NOT IN, NOT EXISTS, and LEFT JOIN IS NULL? In summary, as long as the columns being compared can not be NULL, NOT IN and NOT EXISTS are more efficient than LEFT JOIN/IS NULL...

Something to keep in mind about the difference between IN and EXISTS - EXISTS is a boolean operator, and returns true on the first time the criteria is satisfied. Though you see a correlated subquery in syntax, EXISTS has performed better than IN...

Also, IN and EXISTS only check for the existence of the value comparison. This means there's no duplication of records like you find when JOINing...

It really depends, so if you're really out to find what performs best you'll have to test & compare what the query plans are doing...

OMG Ponies
+1 Nice link with good metrics to back the conclusion.
Thomas
@Thomas: thx, reminds me to nudge Quassnoi to address which is best when columns can be null...
OMG Ponies