views:

118

answers:

5

I have a query where I select a few columns from each of 5 left outer joined tables.

I did an execution plan in SQL Server 2008, and there are basically table scans on all of the joined tables, but the cost is all 0% for them - I'm assuming because there aren't many records in these tables.

Then at the last 2 steps of the execution plan (the final merge join of all the tables and the actual SELECT statement), it says 55% of the cost is the merge join, and 45% of the cost is the select.

This seems strange to me...why is the cost for those final 2 "bring it all together" steps so high? I thought all these table scans or sort steps would have more substantial cost.

I'm trying to get one "summarized" record out of all these tables...maybe I'm taking the wrong approach in left joining everything?

Updated with SQL

SELECT
/* Names */
NM.EMPLID, NM.NAME_PREFIX, NM.LAST_NAME, NM.FIRST_NAME, NM.MIDDLE_NAME, NM.NAME_SUFFIX,
/* Directory Info */
DIR_PERSON.BIRTH_DT,
/* PERSDATA */
PERS.SEX, PERS.HIGHEST_EDUC_LVL,
/* DIVERS.ETHNIC */
ETHNIC.ETHNIC_GRP_CD,
/* TENURE */
TENURE.EMPLID, TENURE.TENURE_STATUS, TENURE.EG_GRANTED_DT, TENURE.EG_TENURE_HOME, 
TENURE.EG_TRACK_HIRE_DT, TENURE.EG_MAND_REVW_DT, TENURE.CODE,
/* VISA */
VISA.VISA_PERMIT_TYPE

FROM NAMES NM

/* ----- Table Joins ----- */
/* Directory Join */
LEFT OUTER JOIN DIR_PERSON ON DIR_PERSON.ID = NM.EMPLID

/* PERS_DATA Join */
LEFT OUTER JOIN PERS ON PERS.EMPLID = NM.EMPLID
AND PERS.EFFDT =( SELECT MAX(PERS_CURRENT.EFFDT) FROM PERS_CURRENT 
     WHERE PERS.EMPLID = PERS_CURRENT.EMPLID 
     AND PERS_CURRENT.EFFDT <= GETDATE())
/* ETHNIC Join */        
LEFT OUTER JOIN  ETHNIC ON ETHNIC.EMPLID = NM.EMPLID
AND ETHNIC.PRIMARY_INDICATOR = 'Y'

/* TENURE Join */
LEFT OUTER JOIN TENURE ON TENURE.EMPLID = NM.EMPLID

/* VISA Join */
LEFT OUTER JOIN VISA ON VISA.EMPLID = NM.EMPLID
AND VISA.EFFDT = ( SELECT MAX(VISA_CURRENT.EFFDT) FROM VISA_CURRENT
     WHERE VISA.EMPLID = VISA_CURRENT.EMPLID 
     AND VISA_CURRENT.EFFDT <= GETDATE())

/* ----- End Table Joins ----- */    

WHERE   NM.NAME_TYPE = 'PRI' 
 AND NM.EFFDT = (SELECT MAX(NM_CURRENT.EFFDT) FROM NM_CURRENT 
     WHERE NM.EMPLID = NM_CURRENT.EMPLID 
     AND NM.NAME_TYPE = NM_CURRENT.NAME_TYPE 
     AND NM_CURRENT.EFFDT <= GETDATE());
A: 

45% of something small is still 45%. It's hard to say without seeing more detail, but I've found the final stages to be very expensive when inserting into a clustered (on non-IDENTITY columns) index table or a table with a lot of indexes.

With all these table scans - are there no indexes?

Cade Roux
This is just a straight select, combining data from these tables into one summarized row for a key (in this case an employee). I don't have admin rights to the DB, nor was I around when it was created, but when I look at all these tables through SQL Server, I see no keys or indexes on any of these tables...but I'm not sure if SQL Server is showing me the truth.
chucknelson
I've totally been there. You might not have good statistics, so the execution plan might not be accurate - get all your info together and get with the DBA.
Cade Roux
A: 

It would be helpful if you included the code, but if you have a GROUP BY or ORDER BY, for example, then that will add a great deal to the query.

If the final select is a large table, and the others are not only tiny but not really used much in the main table, then you need to get to 100% in some part of the query, even if it is a simple part.

James Black
A: 
WHERE   NM.NAME_TYPE = 'PRI' 
        AND NM.EFFDT = (SELECT MAX(NM_CURRENT.EFFDT) FROM NM_CURRENT 
                                        WHERE NM.EMPLID = NM_CURRENT.EMPLID 
                                        AND NM.NAME_TYPE = NM_CURRENT.NAME_TYPE 
                                        AND NM_CURRENT.EFFDT <= GETDATE());

Your 45% is here. If you create an index on MN.NAME_TYPE and on NM.EFFDT you will see that 45% drop.

It may or may not be including this sub-query in the cost of the main select -- if it is then that is really where your problem is. REMEMBER it has to re-run that query for every row.

See other comment for a refactor to a join.

Hogan
The more I look at this, the more I wonder -- what are the sub-queries doing... you have 3 of them -- they run on ever row... this HAS to be what is slowing down the query.
Hogan
Any subquery on a join or the WHERE is because I need the latest record out of those tables for a specific person.
chucknelson
+1  A: 

As Cade said, first check on your indexes.

If the indices are in place, verify your statistics are up to date.

If both of those issues check out, Consider refactoring your subqueries into one or more CTEs and then join them on the relevant criteria. This is not a silver bullet, but in my experience CTE's often perform better than subqueries.

Brook
Exactly! See my comment Brook -- O(N) instead of O(N^2) or worse
Hogan
+1 for not being as lazy as me and actually refactoring the query :)
Brook
lol, thanks Brook
Hogan
+2  A: 

SPEED UP IDEA

I have refactored your query (I did not test so there might be typos) to get rid of the sub-queries. Here you first get all the max items (1 x number of empl) and then run the main select (1 x num of empl). This changes your query from a O(N^3) to O(N), so it should be faster.

I only did two of them, the third should be clear from this example:

WITH mVisa AS
(
 SELECT MAX(VISA_CURRENT.EFFDT) as max, VISA_CURRENT.EMPID as EMPLID
 FROM VISA_CURRENT
 WHERE VISA_CURRENT.EFFDT <= GETDATE()
 GROUP BY VISA_CURRENT.EMPLID
), mPers AS
(
 SELECT MAX(PERS_CURRENT.EFFDT) as max, PERS_CURRENT.EMPLID
 FROM PERS_CURRENT
 AND PERS_CURRENT.EFFDT <= GETDATE())
 GROUP BY PERS_CURRENT.EMPLID
)
SELECT
/* Names */
NM.EMPLID, NM.NAME_PREFIX, NM.LAST_NAME, NM.FIRST_NAME, NM.MIDDLE_NAME, NM.NAME_SUFFIX,
/* Directory Info */
DIR_PERSON.BIRTH_DT,
/* PERSDATA */
PERS.SEX, PERS.HIGHEST_EDUC_LVL,
/* DIVERS.ETHNIC */
ETHNIC.ETHNIC_GRP_CD,
/* TENURE */
TENURE.EMPLID, TENURE.TENURE_STATUS, TENURE.EG_GRANTED_DT, TENURE.EG_TENURE_HOME, 
TENURE.EG_TRACK_HIRE_DT, TENURE.EG_MAND_REVW_DT, TENURE.CODE,
/* VISA */
VISA.VISA_PERMIT_TYPE

FROM NAMES NM

/* ----- Table Joins ----- */
/* Directory Join */
LEFT OUTER JOIN DIR_PERSON ON DIR_PERSON.ID = NM.EMPLID

/* PERS_DATA Join */
LEFT JOIN mPers ON NM.EMPLID = mPers.EMPLID 
LEFT OUTER JOIN PERS ON PERS.EMPLID = NM.EMPLID
AND PERS.EFFDT = mPers.max
/* ETHNIC Join */                                       
LEFT OUTER JOIN  ETHNIC ON ETHNIC.EMPLID = NM.EMPLID
AND ETHNIC.PRIMARY_INDICATOR = 'Y'

/* TENURE Join */
LEFT OUTER JOIN TENURE ON TENURE.EMPLID = NM.EMPLID

/* VISA Join */
LEFT JOIN mVisa ON NM.EMPLID = mVisa.EMPLID
LEFT OUTER JOIN VISA ON VISA.EMPLID = NM.EMPLID
AND VISA.EFFDT = mVisa.max

/* ----- End Table Joins ----- */       

WHERE   NM.NAME_TYPE = 'PRI' 
        AND NM.EFFDT = (SELECT MAX(NM_CURRENT.EFFDT) FROM NM_CURRENT 
                                        WHERE NM.EMPLID = NM_CURRENT.EMPLID 
                                        AND NM.NAME_TYPE = NM_CURRENT.NAME_TYPE 
                                        AND NM_CURRENT.EFFDT <= GETDATE());
Hogan
Thanks Hogan - the server I'm pointing to seems to not support SHOWPLAN_XML, so now I'm worried it doesn't support CTE's either. I'll try ASAP on Monday...otherwise I'm going to try and talk to the people who maintain this DB and hopefully optimize some things...
chucknelson
SQL 2008? Then it'll work -- showplan_xml requires dbo_owner or some other right. CTEs are part of the language now, should be fine.
Hogan
If you are pre 2005 then you can do an insert into #tempname instead and link to those tables.
Hogan
Ah cool, I'll try that out then...looks just like CTEs! Can you tell I still have a ton to learn with this? ;) Thanks Hogan!
chucknelson
The server is SQL Server 2000, but the temp tables seem to be working great. The SELECT is still showing as 49% of the cost, but the query is WAY faster with this "get the max values in a table first" approach. Thanks for the help!
chucknelson
This example is using CTEs.The 45% must be time spent making a dynamic index. If you create the index before hand it will be faster and you should see the percentages you expect.
Hogan