views:

133

answers:

3

I have two tables with the following columns (identical columns in both tables):

  1. FunctionName,
  2. FrequencyCount

I want to merge these two tables into a final table with columns:

  1. Function Name
  2. Base Frequency count
  3. Compared Frequency count
  4. Delta of Frequency count

Merge operation should happen as follows:

  • If [FunctionName] is in Table1 and NOT in Table2,

    [Base Frequency Count] = Table1.[FrequencyCount]
    [Compared Frequency Count] = 0
    [Delta of Frequency Count] = Table1.[FrequencyCount]
    
  • If [FunctionName] is in Table2 and NOT in Table1,

    [Base Frequency Count] = 0         
    [Compared Frequency Count] = Table2.[FrequencyCount]
    [Delta of Frequency Count] = Table2.[FrequencyCount]
    
  • If [FunctionName] is both in Table1 and Table2,

    [Base Frequency Count] = Table1.[FrequencyCount]         
    [Compared Frequency Count] = Table2.[FrequencyCount]
    [Delta of Frequency Count] = Table1.[FrequencyCount]-Table2.[FrequencyCount]
    

It is desirable that the query has good performance with minimum no. of joins. It would be great if someone can give good pointers.

+1  A: 
SELECT *
  INTO new_table_name 
  FROM (SELECT t.frequencycount 'Base Frequency Count',
               0 'Compared Frequency Count',
               t.frequencycount 'Delta of Frequency Count'
          FROM TABLE1 t
          JOIN TABLE2 t2 ON t2.functionname != t.functionname
        UNION
        SELECT 0 'Base Frequency Count',
               t2.frequencycount 'Compared Frequency Count',
               t2.frequencycount 'Delta of Frequency Count'
          FROM TABLE2 t2
          JOIN TABLE1 t ON t.functionname != t2.functionname
        UNION
        SELECT t.frequencycount 'Base Frequency Count',
               t2.frequencycount 'Compared Frequency Count',
               t.frequencycount - t2.frequencycount 'Delta of Frequency Count'
          FROM TABLE1 t
          JOIN TABLE2 t2 ON t.functionname = t2.functionname)
OMG Ponies
That was perfect ! Thanks for your response.But there is one problem that I noticed while executing this query.When I include 'Function Name' in the resulting table (new_table_name), the query takes approximately 1.49min to execute in comparison to 0.3 sec when 'Function Name' is not selected.Any comments on that ?
Do you have an index on FunctionName in the two tables you are copying from?
OMG Ponies
No. 'Function Name' are not unique in the table.
You're mixing up index with a unique constraint. If you specify an index on the functionName column, you should see the time improve.
OMG Ponies
Thanks for your prompt responses. Yes you were right. The column [Function Name] has a datatype nvarchar(max) which restricts specification of index on [Function Name]. Any alternatives ?
You're welcome. But I confirmed that you can apply an index to an nvarchar(max) column. See the Performance Considerations section here: http://msdn.microsoft.com/en-us/library/ms190806(SQL.90).aspx
OMG Ponies
Is nvarchar(max) really necessary? That's a 4,000 character limit.
Lloyd McFarlin
A: 

Not sure if this would be slower than rexem's solution (I didn't benchmark it). But this might be easier for you to read. Performance could be significantly improved by Selecting based on Primary Keys instead of FunctionName as well.

--TABLE 1
SELECT FunctionName, FrequencyCount AS Base, 0 AS Compared, FrequencyCount AS Delta
FROM TableOne
WHERE FunctionName NOT IN (SELECT FunctionName FROM TableTwo)
UNION

--TABLE 2
SELECT FunctionName, 0 AS Base, FrequencyCount AS Compared, FrequencyCount AS Delta
FROM TableTwo
WHERE FunctionName NOT IN (SELECT FunctionName FROM TableOne)
UNION

--DELTA
SELECT FunctionName, 
(SELECT FrequencyCount FROM TableOne WHERE FunctionName = DeltaTable.FunctionName) AS Base,
(SELECT FrequencyCount FROM TableTwo WHERE FunctionName = DeltaTable.FunctionName) AS Compared,
(SELECT FrequencyCount FROM TableOne WHERE FunctionName = DeltaTable.FunctionName) - (SELECT FrequencyCount FROM TableTwo WHERE FunctionName = DeltaTable.FunctionName) AS Delta
FROM TableOne DeltaTable
WHERE FunctionName IN (SELECT FunctionName FROM TableOne)
AND FunctionName IN (SELECT FunctionName FROM TableTwo)

Modified Delta

--DELTA
SELECT One.FunctionName, 
One.FrequencyCount AS Base,
Two.FrequencyCount AS Compared,
One.FrequencyCount - Two.FrequencyCount AS Delta
FROM TableOne One
INNER JOIN TableTwo Two
    ON One.FunctionName = Two.FunctionName
Ikarii Warrior
Subqueries in the SELECT clause are going to be a performance nightmare - they get executed for every record in the resultset. The IN clause supposedly runs slower than using EXISTS (which arguably runs slower than using JOINs).
OMG Ponies
Delta could be changed to such:--DELTASELECT One.FunctionName, One.FrequencyCount AS Base,Two.FrequencyCount AS Compared,One.FrequencyCount - Two.FrequencyCount AS DeltaFROM TableOne OneINNER JOIN TableTwo Two ON One.FunctionName = Two.FunctionNameWhich would eliminate the Sub Selects... Closer to your solution with the Joins. Which Atul shouldn't be afraid of necessarily if the proper indexes are on the tables.
Ikarii Warrior
+1  A: 
SELECT ISNULL(fn.FunctionName, fc.FunctionName) AS FunctionName, 
       ISNULL(fn.FrequencyCount, 0) AS BaseFrequency,
       ISNULL(fc.FrequencyCount, 0) AS ComparedFrequencyCount,
       COALESCE((fn.FrequencyCount - fc.FrequencyCount), fn.FrequencyCount, fc.FrequencyCount) AS DeltaOfFrequencyCount
INTO FinalTable
FROM FunctionName fn FULL OUTER JOIN FrequencyCount fc ON fn.FunctionName = fc.FunctionName

Note that the COALESCE will result in a null in the first expression (moving on to the next in the chain) if either fn.FrequencyCount or fc.FrequencyCount is null (in SQL, value - null = null).

Lloyd McFarlin
Now that I think about, the CASE statements can be done away with too and replaced by ISNULL's.
Lloyd McFarlin