tags:

views:

296

answers:

8

Hi All,

I've inherited a less-than-ideal table structure, and I'm trying to improve it as much as I can without tearing down and rebuilding. There's currently at least two levels of data for everything, the legacy data and the marketing override data. I'm trying to find all the records within the legacy data that don't yet have a marketing override.

So far, this is what I have:

SELECT DISTINCT 
  old.STYLE_NBR, old.COLOR_NBR 
FROM 
  LEGACY_PRODUCT_TABLE old
INNER JOIN 
  MARKETING_PRODUCT_TABLE new
ON old.STYLE_NBR <> new.style_number AND old.COLOR_NBR <> new.colour_number

This seems to work, but it takes a few minutes to run. If at all possible, I'd like a more efficient way of doing this.

Other info:

  • There are about 60,000 records in the legacy table
  • There are about 7,000 in the marketing table
  • Both STYLE_NBR and COLOR_NBR are char(5) and, when combined, make a unique ID.
+1  A: 

Are the join fields indexed? That should speed things up considerably. Make sure old.STYLE_NBR, old.COLOR_NBR, new.style_number, and new.color_number are indexed.

GoatRider
+1  A: 

I don't know if this will be faster, but it may be worth a try.

SELECT DISTINCT 
  old.STYLE_NBR, old.COLOR_NBR 
FROM 
  LEGACY_PRODUCT_TABLE old
WHERE old.STYLE_NBR, old.COLOR_NBR 
NOT IN 
(
    SELECT old.STYLE_NBR, old.COLOR_NBR 
    FROM LEGACY_PRODUCT_TABLE old
    INNER JOIN 
        MARKETING_PRODUCT_TABLE new
        ON 
            old.STYLE_NBR == new.style_number AND old.COLOR_NBR == new.colour_number
)
Elie
+2  A: 

You should be using a LEFT OUTER JOIN and change your lookup

SELECT DISTINCT 
  old.STYLE_NBR, old.COLOR_NBR 
FROM 
  LEGACY_PRODUCT_TABLE old
  LEFT OUTER JOIN MARKETING_PRODUCT_TABLE new
    ON (old.STYLE_NBR + old.COLOR_NBR) = (new.style_number + new.Colour_number)
WHERE (new.style_number + new.Colour_number) IS NULL
Lieven
+1  A: 
SELECT 
    old.* 
FROM 
    LEGACY_PRODUCT_TABLE old 
LEFT JOIN
    MARKETING_PRODUCT_TABLE new 
ON 
    new.style_number=old.STYLE_NBR AND 
    new.colour_number=old.COLOR_NBR 
WHERE 
    new.style_number IS NULL;

Stands a better chance of using the indexes which you presumably have on the four columns in question.

Jon Bright
A: 

What about NOT EXISTS?

SELECT DISTINCT old.STYLE_NBR, old.COLOR_NBR 
FROM LEGACY_PRODUCT_TABLE old
WHERE NOT EXISTS
    (SELECT 1 FROM MARKETING_PRODUCT_TABLE new 
    WHERE old.STYLE_NBR = new.style_number 
      AND old.COLOR_NBR = new.colour_number)
Max Gontar
+1  A: 

Some options to try are:

SELECT
    old.STYLE_NBR,
    old.COLOR_NBR
FROM  
    LEGACY_PRODUCT_TABLE old
LEFT OUTER JOIN
    MARKETING_PRODUCT_TABLE new
ON
    old.STYLE_NBR = new.style_number
AND
    old.COLOR_NBR = new.colour_number
WHERE
    new.style_number IS NULL



SELECT
    old.STYLE_NBR,
    old.COLOR_NBR
FROM  
    LEGACY_PRODUCT_TABLE old
WHERE
    NOT EXISTS
(
    SELECT
     *
    FROM
     MARKETING_PRODUCT_TABLE new
    WHERE
     old.STYLE_NBR = new.style_number
    AND
     old.COLOR_NBR = new.colour_number
)

EDIT: The key thing with both of these is that you are joining using = rather than <>.

Robin Day
+2  A: 

What you currently have is incorrect because it will return a row for every row that doesn't match, so potentially 6999 rows in the result per row in the legacy table if there is marketing override, or 7000 if there isn't. the distinct will then discard the duplicates, but the result will be wrong because even if there is a marketing matching row, the non-matching ones will make sure the result set will include the ones where there is no row.

Try this instead:

select distinct style_nbr, color_nbr
 from legacy_product_table L
where not exists
(
   select * from marketing_product_table m
   where m.style_nbr = L.style_nbr and m.color_nbr = L.color_nbr
)

Make sure the product table has an index on (style_nbr,color_nbr).

Carlos A. Ibarra
I prefer the left join method, but I like the way you explained why the orginal was incorrect.
HLGEM
A: 

-- What about EXCEPT? (if this is SQL Server 2005 or 2008) select old.Style_NBR, Old.Color_NBR except select new.Style_NBR, new.Color_NBR

-- try the code below in mssql 2008

declare @Old table( Color_Nbr tinyint, Style_Nbr tinyint )

declare @New table ( Color_Nbr tinyint, Style_Nbr tinyint )

insert into @Old values (1,1), (2,2), (3,3), (4,4)

insert into @New values (1,1), (2,2), (3,3), (5,5)

select o.Color_Nbr, o.Style_Nbr from @Old o

except

select n.Color_Nbr, n.Style_Nbr from @New n