views:

54

answers:

3

Hello,

I am trying to merge 2 databases with the same schema together, and this is one part of it.

I have changed the subject to keep it more understandable - I cannot change this schema, it's just what I'm working with.

I have a table in both my source and target databases with the following columns:

Car
CarType1
CarType2
CarType3
CarType4

I am trying to write a query that will tell me in the target database, which rows have the same Cars between the 2 databases, but different CarTypes. All I need is a count on the rows that are different.

My query written in english would be: Bring me back a count of rows where the Car is the same and the CarTypes between the two systems do not match. It doesn't matter if the CarType is in a different CarType field between the two, just whether all of the values are contained in one of the 4 fields or not.

So if in my source database this row:

Car: Mustang
CarType1: Fast
CarType2: Convertible
CarType3: null
CarType4: null

And in my target database I have this row:

Car: Mustang
CarType1: Fast
CarType2: Convertible
CarType3: Sports
CarType4: null

This would count as a non-match, since it's a Mustang and because the aggregate of the CarType fields is different. What order the values are in does not matter for this.

How would I write this query? I cannot get a grasp on it.

+1  A: 

Try this and let me know if it works:

SELECT * FROM db1.dbo.Cars
EXCEPT
SELECT c1.* FROM db1.dbo.Cars as c1
INNER JOIN db2.dbo.Cars as c2
ON    c1.Car = c2.Car
      AND
      --Check carType1
      (c1.CarType1 = c2.CarType1 OR 
      c1.CarType1 = c2.CarType2 OR 
      c1.CarType1 = c2.CarType3 OR 
      c1.CarType1 = c2.CarType4) 
      AND
      --Check carType2
      (c1.CarType2 = c2.CarType1 OR 
      c1.CarType2 = c2.CarType2 OR 
      c1.CarType2 = c2.CarType3 OR 
      c1.CarType2 = c2.CarType4)
      AND
      --Check carType3
      (c1.CarType3 = c2.CarType1 OR 
      c1.CarType3 = c2.CarType2 OR 
      c1.CarType3 = c2.CarType3 OR 
      c1.CarType3 = c2.CarType4)
      AND
      --Check carType4
      (c1.CarType4 = c2.CarType1 OR 
      c1.CarType4 = c2.CarType2 OR 
      c1.CarType4 = c2.CarType3 OR 
      c1.CarType4 = c2.CarType4)

NOTE:

You will have to add ISNULLs to this to either include or exclude if a column is null.

To exclude:

ISNULL(c1.CarType4, -1) = ISNULL(c2.CarType4, -2)

To include:

ISNULL(c1.CarType4, -1) = ISNULL(c2.CarType4, -1)
Abe Miessler
+1  A: 
;WITH SourceT AS (
SELECT 'Toyota' AS Car, 'A' AS CarType1, 'B' AS CarType2, 'C' CarType3, 'D' CarType4 UNION ALL
SELECT 'BMW' AS Car, 'A' AS CarType1, 'B' AS CarType2, 'C' CarType3, 'D' CarType4 UNION ALL
SELECT 'Mustang' AS Car, 'Fast' AS CarType1, 'Convertible' AS CarType2, 'Sports' CarType3, NULL CarType4 
),
TargetT AS (
SELECT 'Toyota' AS Car, 'D' AS CarType1, 'C' AS CarType2, 'B' CarType3, 'A' CarType4 UNION ALL
SELECT 'BMW' AS Car, 'D' AS CarType1, 'C' AS CarType2, 'B' CarType3, 'A' CarType4  UNION ALL
SELECT 'Mustang' AS Car, 'Fast' AS CarType1, 'Convertible' AS CarType2, NULL CarType3, NULL CarType4 )

SELECT *
FROM SourceT s
WHERE NOT EXISTS
(
SELECT * 
FROM TargetT t 
WHERE s.Car = t.Car AND 0 =
(SELECT COUNT(*) FROM 
    ( (
       (SELECT s.CarType1 AS t UNION ALL 
        SELECT s.CarType2 AS t UNION ALL 
        SELECT s.CarType3 AS t UNION ALL 
        SELECT s.CarType4 AS t )
    EXCEPT                
       (SELECT t.CarType1 AS t UNION ALL 
        SELECT t.CarType2 AS t UNION ALL 
        SELECT t.CarType3 AS t UNION ALL 
        SELECT t.CarType4 AS t )
        ) 
    UNION ALL
    (
       (SELECT t.CarType1 AS t UNION ALL 
        SELECT t.CarType2 AS t UNION ALL 
        SELECT t.CarType3 AS t UNION ALL 
        SELECT t.CarType4 AS t )
    EXCEPT                
       (SELECT s.CarType1 AS t UNION ALL 
        SELECT s.CarType2 AS t UNION ALL 
        SELECT s.CarType3 AS t UNION ALL 
        SELECT s.CarType4 AS t )
        )     
        ) T
    )
)

Or a slightly shorter version

SELECT *
FROM SourceT s
WHERE NOT EXISTS
(
SELECT * 
FROM TargetT t 
WHERE s.Car = t.Car AND 
(SELECT t FROM (SELECT s.CarType1 AS t UNION ALL 
                SELECT s.CarType2 AS t UNION ALL 
                SELECT s.CarType3 AS t UNION ALL 
                SELECT s.CarType4 AS t ) D ORDER BY t FOR XML PATH(''))=
(SELECT t FROM (SELECT t.CarType1 AS t UNION ALL 
                SELECT t.CarType2 AS t UNION ALL 
                SELECT t.CarType3 AS t UNION ALL 
                SELECT t.CarType4 AS t ) D ORDER BY t FOR XML PATH(''))
    )
Martin Smith
This returns the types as XML. What are your thoughts on my solution? I tried it with your CTEs and they return the same.
Abe Miessler
@Abe - Got rid of the XML concatenation step now.
Martin Smith
Seems a little cleaner that way. Still trying to figure out how the `SELECT COUNT` section works...
Abe Miessler
@Abe - It uses the trick from here http://sqlblogcasts.com/blogs/simons/archive/2006/05/08/Neat-trick-to-find-max-value-of-multiple-columns.aspx if that's what you mean? Or you might have been wondering how it was meant to work at all ? There was an error in it that I just fixed!
Martin Smith
Yeah I was just trying to wrap my head around how all the UNIONs played into your query. I played around with it a bit this morning and I think I get it. Very interesting approach.
Abe Miessler
+1  A: 
SELECT c1.car
FROM target.cars c1
INNER JOIN source.cars c2
ON c2.car = c1.car
WHERE
COALESCE( c1.cartype1, '') NOT IN 
( '', c2.cartype1, c2.cartype2, c2.cartype3, c2.cartype4 )
OR
COALESCE( c1.cartype2, '') NOT IN 
( '', c2.cartype1, c2.cartype2, c2.cartype3, c2.cartype4 )
OR
COALESCE( c1.cartype3, '') NOT IN 
( '', c2.cartype1, c2.cartype2, c2.cartype3, c2.cartype4 )
OR
COALESCE( c1.cartype4, '') NOT IN 
( '', c2.cartype1, c2.cartype2, c2.cartype3, c2.cartype4 )
OR
LEN( COALESCE( c1.cartype1, '') + COALESCE( c1.cartype2, '') +
 COALESCE( c1.cartype3, '') + COALESCE( c1.cartype4, '') ) 
<>
LEN( COALESCE( c2.cartype1, '') + COALESCE( c2.cartype2, '') + COALESCE( c2.cartype3, '') + 
COALESCE( c2.cartype4, '') )
;
Fred Sobotka
Notes: Using IN() is an easy way to match across an unpredictably ordered set of items. COALESCE() will prevent any failures or inconsistencies encountered when processing columns containing NULL. The comparison of LEN() totals should match if the rows are identical.
Fred Sobotka
Should coalesce(c3. be coalesece(c1. ? If not what is c3.? Thank you
You're right. I have corrected the query.
Fred Sobotka