views:

94

answers:

6

I have an SQL query that executes a LEFT JOIN on another table, then outputs all results that could be coupled into a designated table. I then have a second SQL query that executes the LEFT JOIN again, then outputs the results that could not be coupled to a designated table. In code, this is something like:

INSERT INTO coupledrecords
SELECT b.col1, b.col2... s.col1, s.col2... FROM bigtable AS b
LEFT JOIN smallertable AS s
ON criterium
WHERE s.col1 IS NOT NULL

INSERT INTO notcoupledrecords
SELECT b.col1, b.col2... bigtable AS b
LEFT JOIN smallertable AS s
ON criterium
WHERE s.col1 IS NULL

My question: I now have to execute the JOIN two times, in order to achieve what I want. I have a feeling that this is twice as slow as it could be. Is this true, and if yes, is there a way to do it more efficiently?

+3  A: 

the bottom line is, you need 2 queries because you're inserting into 2 different tables. if you only had one table instead of coupledrecords and notcoupledrecords then you could do it in 1 query. :)

oedo
I think you could insert into a view though as per my answer. Unfortunately I haven't got time to test it and see whether (a) it works and (b) what the Query plan looks like and whether there would be any benefit.
Martin Smith
A: 

You're basically choping the data in two. Once You've done it once you have a set of keys in coupledrecords and should just do a NOT IN from that

INSERT INTO notcoupledrecords
SELECT b.col1, b.col2... bigtable AS b
WHERE some_col not in (select some_col from coupledrecords) 
Preet Sangha
+4  A: 

If you're inserting the different results into 2 different tables, you will need the 2 different queries.

The only thing I'd suggest, is that the "coupledrecords" query can just be an INNER JOIN:

INSERT INTO coupledrecords
SELECT b.col1, b.col2... s.col1, s.col2... FROM bigtable AS b
INNER JOIN smallertable AS s 
ON criterium

If you were inserting into the SAME table, with a field to indicate whether or not it was a matched record or not, then yes you could do it as one query.

AdaTheDev
littlegreen
+3  A: 

One way I think you could do it would be to create a Partitioned View with the check constraint on a column indicating coupled/not coupled. Then insert into the view and let SQL Server figure out the destination table. Not suggesting that you do do it just thought I'd mention it as a possibility!

INSERT INTO coupledrecordsView
SELECT case WHEN s.col1 IS NULL THEN 1 ELSE 0 END AS IsCoupled,
b.col1, b.col2... s.col1, s.col2... FROM bigtable AS b
LEFT JOIN smallertable AS s
ON criterium
Martin Smith
hah, I just posted a partitioned view answer too-- right before I posted mine I saw yours, and referred to it in my answer. +1 !
Justin Grant
Your answer seems much more comprehensive than mine though!
Martin Smith
Cool. It seems like this would improve performance without any caveats?
littlegreen
Hm, I can't get it to work, because updatable views don't work on tables with IDENTITY constraints...
littlegreen
Looks like you found a caveat straight away then :-). Also I'm not sure how SQL Server would actually process this without looking at an execution plan.
Martin Smith
I can also not use natural keys, because most of my natural key columns would be the ones that are coupled and turn out NULL. But I'm glad I tried it, it's interesting thinking.
littlegreen
+2  A: 

If you can change the clustered indexes of coupledrecords and notcoupledrecords to contain one column from smalltable (including tacking a computed bit column on the end of each clustered index, solely for this purpose-- see @Martin Smith's answer above for details) then you could use a Partitioned View for the insert. That's the easy possibility.

If that's not possible, then you can also try a non-partitioned view solution. See below-- it's more involved.

Without knowing how your data is distributed (e.g. row size, number of nullable vs. non-nullable columns, ratio of coupled to uncoupled) it's hard to recommend a general solution, but one solution which might work well in most cases is to use views to simulate coupled and uncoupled tables on top of a single "maybecoupled" table. Using views means your existing query code (other than the insert) won't have to change.

This seem horribly inefficient at first glance, but remember that nulls take up zero storage space and, with proper indexes, SQL will not waste much filtering out the "other view's" rows.

Here's how it would work:

  • INSERT all records into a the base table (e.g. maybecoupled) in a single pass
  • Ensure there's an index (ideally a clustered index, but nonclustered is OK too) on one of the columns you got from from smalltable. Let's assume this is indexedcol1
  • Create two views on top: coupledrecords and notcoupledrecords, whose definitions are SELECT col1, ... FROM maybecoupled WHERE indexedcol1 IS NULL and SELECT col1, ... FROM maybecoupled WHERE indexedcol1 IS NOT NULL.
  • if you have a clustered index on indexedcol1, then you will pay little or no perf penalty for most queries, since every query to either view will only hit the appropriate half of the records and never touch the other half. Your non-clustered indexes will get a little bigger and hence a little slower, but even that can be ameliorated with filtered indexes.
  • if you can't use a clustered index, make sure that indexcol1 is part of (or INCLUDE-d into) every non-clustered index. This prevents having to go back to the clustered index to look up indexcol1 for queries which are only pulling data from non-clustered indexes.

Here are a few cases where the above solution won't work:

  • if the number of coupled rows is relatively small and you either have a lot of non-nullable columns in bigtable or you have very small rows. Then the space overhead of all those non-coupled rows may hurt. (nulls don't take up space, but non-nullable columns do.)
  • if you are using a non-clustered indexcol1 and can't change your indexes to ensure indexcol1 is present in your non-clustered indexes
  • if the shenanigans above cause SQL Server to pick the wrong indexes to use in query plans due to increased complexity of queries (although you can fix this with index hints)

Caveat: you'll definitely want to test performance of any view-based soluton to make sure it doesn't make things worse-- SQL is usually good at picking good query plans, but not always. Test, Test, Test!

Justin Grant
I'm interested to try your solution, however I have little experience with clustered indexes and how to manipulate them. Another way I thought of, is to create a trigger on one table that redirects the row to another table when it is invalid. Do you have any idea how that would perform?
littlegreen
Personally, I'm not a big fan of triggers because they cause things to happen "invisibly" when inserting/updating/deleting from a table. it's easy for future developers to forget that the trigger is there, leading to unintended consequences. For the clustered index question, why don't you post the DDL of your tables and indexes in an edit to your question? If we can see where your indexes are now, it will be easy to tell you whether a view solution would be easy or hard. Also, please post more info about the ratio of coupled to uncoulped rows-- this may be important to figure out a solution.
Justin Grant
A: 

I like the partionedview choice but in case that isn't an option (not supported in your SQL version), it would be perfectly reasonable to use a temporary table to store the results of the query Martin Smith suggested, then do the inserts.

SELECT 
    CASE WHEN s.col1 IS NULL THEN 1 ELSE 0 END AS IsCoupled
    , b.col1
    , b.col2
    ... 
    , s.col1
    , s.col2
INTO
    #tempResults
FROM 
    bigtable AS b
LEFT OUTER JOIN 
    smallertable AS s
ON  criterium

INSERT INTO 
    coupledrecords
SELECT 
    , b.col1
    , b.col2
    ... 
    , s.col1
    , s.col2
FROM
    #tempResults
WHERE 
    IsCoupled = 1

INSERT INTO 
    notcoupledrecords
SELECT 
    , b.col1
    , b.col2
    ... 
    , s.col1
    , s.col2
FROM
    #tempResults
WHERE 
    IsCoupled = 0
Mark Storey-Smith