views:

149

answers:

4

Well I have a series of sps that are running a data warehousing solution that we have developed in house. While on the most part it runs pretty good, there is one stored procedure that runs really slow. It takes about 30 minutes on average to execute. I know exactly where the bottle neck is, I just don't know how to fix it.

Basically what the stored procedure does is create a series of variable temp tables and insert into those tables. no problem there. it then joins the temp tables and inserts into another temp table, twice (the second one is a little different).

this is the statement that joins the tables and inserts into the temp table, and is the part that takes forever. any help would be greatly appreciated.

; with Requested as
(
    select distinct 
     PONUMBER as PONumber, 
     min(REQSTDBY) as RequestedBy
    from dw.POP10110
    where REQSTDBY <>''
    group by PONUMBER
)
insert into @tblTableA
(
    PONumber, 
    ReceiptNumber,
    ReceiptLineNumber, 
    VendorID, 
    POType, 
    QuantityShipped, 
    QuantityInvoiced,
    ItemNumber, 
    ItemDescription, 
    UofM, 
    UnitCost, 
    ExtendedCost, 
    SiteID,
    ProjectNumber, 
    AccountID, 
    RequestedBy, 
    GLPostDate, 
    VendorName, 
    CostCategoryID
) 

select 
    a.PONUMBER, 
    a.POPRCTNM, 
    a.RCPTLNNM, 
    a.VENDORID, 
    a.POPTYPE, 
    a.QTYSHPPD, 
    a.QTYINVCD,
    b.ITEMNMBR,
    b.ITEMDESC,
    b.UOFM,
    b.UNITCOST,
    b.EXTDCOST,
    b.LOCNCODE,
    b.ProjNum,
    case i.CostCategoryID 
     when 'MISC' then isnull(i.AccountID,'N/A')

     else 
      case j.CostCategoryID 
       when 'MISC' then 
        isnull(j.AccountID,'N/A')

       else 
        isnull(c.PurchaseAccount,'N/A') 
      end
    end as AccountID,
    d.RequestedBy,
    coalesce(e.GLPOSTDT, f.GLPOSTDT, '') as GLPostDate,
    coalesce(e.VENDNAME, f.VENDNAME, '') as VENDNAME,
    case i.CostCategoryID when 'MISC' then i.CostCategoryID else 
     case j.CostCategoryID when 'MISC' then j.CostCategoryID else coalesce(g.CostCategoryID, h.CostCategoryID, '') end
    end as CostCategoryID

from dw.POP10500 a
    inner join dw.POP30310 b
     on a.PONUMBER=b.PONUMBER
      and a.POPRCTNM=b.POPRCTNM
      and a.RCPTLNNM=b.RCPTLNNM
    left outer join @gl00100 c
     on b.INVINDX=c.ActID
    left outer join Requested d
     on b.PONUMBER = d.PONumber
    left outer join dw.POP30300 e
     on b.POPRCTNM=e.POPRCTNM
    left outer join dw.POP10300 f
     on b.POPRCTNM=f.POPRCTNM
    left outer join @pop31310 g
     on b.POPRCTNM=g.ReceiptNumber
    left outer join @pop11310 h
     on b.POPRCTNM=h.ReceiptNumber
    left outer join @pop30390 i
     on a.POPRCTNM=i.ReceiptNumber
    left outer join @pop10390 j
     on a.POPRCTNM=j.ReceiptNumber

As far as i am aware the appropriate indexes are in place.

+2  A: 

How many rows are in your temp tables/table variables?

It looks like you are using table variables, but if they have a lot of rows, consider using temp tables instead (tables with name starting with #). You can add extra indexes to these tables, whereas Table Variables can only have a single clustered index.

i.e. convert your table variables
@pop31310, @pop11310, @pop30390, @pop10390
into temp tables, and give each one an index on the ReceiptNumber column.

codeulike
the max is 137k, the lowest is 57. the mean is around 8k
DForck42
Yeah I reckon indexes on the temp tables would help then. Either via my suggestion or by adding clustered indexes to the table variables as per David B's suggestion.
codeulike
well a friend of mine is helping me. i'm thinking that i don't really need the temp tables anymore and might just join off of the actual tables. the reason why they were initially temp tables was that they were on a different server, but i am now replicating the tables to the same database.
DForck42
Yes that would probably be better.
codeulike
actually, i was looking and the ones that can do have a primary key on Receipt Number (some of them have duplicate receipt numbers... nothing i can do about that)
DForck42
+1  A: 

Caveat: most of my experience is with Oracle, not SQLServer, but I believe that the answer holds.

Generally, you want to minimize the number of tables in a join, because the optimizer will have to work through the permutations, and tends to get confused. With data warehouse-style queries, I'd look for no more than three tables in a single join, and build the queries piecewise.

The other obvious question: are these temporary tables indexed? If no, then you'll have to do a relatively inefficient all-rows join on them (either table scan or hash join if supported).

kdgregory
Possibly, but I've written dozens of MS-SQL queries with 20 or 30 joins and the optimizer has hardly ever let me down
codeulike
+1  A: 

In your variable tables (those ones starting with @ ), there's probably no indexing.

The only indexing that can be added to a variable table, is a clustered index primary key.

This physically orders the variable table by OrderID.

DECLARE @Orders TABLE
(
  OrderID int PRIMARY KEY,
  CustomerID int
  OrderDate datetime
)

This physically orders the @Orders table by CustomerID and breaks ties using OrderID. It is now suitable for joining on CustomerID.

DECLARE @Orders TABLE
(
  OrderID int,
  CustomerID int
  OrderDate datetime,
  PRIMARY KEY(CustomerID, OrderID)
)

Also - be aware that variable tables lack statistics at the time of optimization. The optimizer will always consider them as having 0 rows and will use nested loop joins when other join types may be more appropriate.

David B
A: 

added statistics to some of the tables and an index and it cut the time down from 25 minutes to 9 minutes.

DForck42