views:

69

answers:

5

Ok... I don't understand why this query is taking so long (MSSQL Server 2005):
[Typical output 3K rows, 5.5 minute execution time]

SELECT dbo.Point.PointDriverID, dbo.Point.AssetID, dbo.Point.PointID, dbo.Point.PointTypeID, dbo.Point.PointName, dbo.Point.ForeignID, dbo.Pointtype.TrendInterval, coalesce(dbo.Point.trendpts,5) AS TrendPts, LastTimeStamp = PointDTTM, LastValue=PointValue, Timezone 
FROM dbo.Point 
    LEFT JOIN dbo.PointType ON dbo.PointType.PointTypeID = dbo.Point.PointTypeID
    LEFT JOIN dbo.PointData ON dbo.Point.PointID = dbo.PointData.PointID 
        AND PointDTTM = (SELECT Max(PointDTTM) FROM dbo.PointData WHERE PointData.PointID = Point.PointID)
    LEFT JOIN dbo.SiteAsset ON dbo.SiteAsset.AssetID = dbo.Point.AssetID
    LEFT JOIN dbo.Site ON dbo.Site.SiteID = dbo.SiteAsset.SiteID
WHERE onlinetrended =1 and WantTrend=1

PointData is the biggun, but I thought its definition should allow me to pick up what I want easily enough:

CREATE TABLE [dbo].[PointData](
    [PointID] [int] NOT NULL,
    [PointDTTM] [datetime] NOT NULL,
    [PointValue] [real] NULL,
    [DataQuality] [tinyint] NULL,
CONSTRAINT [PK_PointData_1] PRIMARY KEY CLUSTERED 
(
    [PointID] ASC,
    [PointDTTM] ASC
) WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

CREATE NONCLUSTERED INDEX [IX_PointDataDesc] ON [dbo].[PointData] 
(
    [PointID] ASC,
    [PointDTTM] DESC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
GO

PointData is 550M rows, and Point (source of PointID) is only 28K rows. I tried making an Indexed View, but I can't figure out how to get the Last Timestamp/Value out of it in a compatible way (no Max, no subquery, no CTE).

This runs twice an hour, and after it runs I put more data into those 3K PointID's that I selected. I thought about creating LastTime/LastValue tables directly into Point, but that seems like the wrong approach.

Am I missing something, or should I rebuild something? (I'm also the DBA, but I know very little about A'ing a DB!)

+1  A: 

For starters, try getting rid of the correlated subquery. I also rewrote it with table aliases, making it much easier to read (and less typing!).

Try something like this:

SELECT p.PointDriverID, p.AssetID, p.PointID, 
    p.PointTypeID, p.PointName, p.ForeignID, 
    pt.TrendInterval, coalesce(p.trendpts,5) AS TrendPts, 
    LastTimeStamp = PointDTTM, LastValue=PointValue, Timezone 
FROM dbo.Point p
    LEFT JOIN dbo.PointType pt ON pt.PointTypeID = p.PointTypeID
    LEFT JOIN dbo.PointData pd ON p.PointID = pd.PointID 
    INNER JOIN (
        SELECT PointID, Max(PointDTTM) as MaxPointDTTM
        FROM dbo.PointData
        group by PointID
    ) pdm on pd.PointID = pdm.PointID and pd.PointDTTM = pdm.MaxPointDTTM
    LEFT JOIN dbo.SiteAsset sa ON sa.AssetID = p.AssetID
    LEFT JOIN dbo.Site ON s.SiteID = sa.SiteID
WHERE onlinetrended =1 and WantTrend=1
RedFilter
+1 but I ran out of votes, sorry.
Mark Byers
That takes slightly longer (6 minutes). <br>The real culprit is: SELECT PointID, Max(PointDTTM) as MaxPointDTTM FROM dbo.PointData group by PointID Which takes 5 minutes to run. <br>I figured that if I correllated them it would only run on the "interested 3K" rows and be much faster. If I could save the "Max Timestamp" for each PointID, I'd be in a great position, but Index Views don't like the Max() aggregation :(
@thormj: try updating your statistics
RedFilter
A: 

I'm not a SQLServer person, but I know that querying a table with a where clause that subqueries the same table is bad news, especially with such large recordsets. Conceptually you are looking through that sub-selected table again FOR EVERY ROW of data. If I remember right SQLServer lets you store variables in memory, if not, that's fine, you can do this with a table.

Create a server variable (or a table, it only needs one column and will only have one row). Now create a trigger so that whenever a record is inserted or updated in PointData that checks the variable (or that record). If the datetime of the inserted or updated record is Greater than the variable, update the variable. Now you can use that variable in your query or join to that table. Should cut a lot of time out of your query.

invertedSpear
A: 

An non-clustered index on PointData.PointDTTM may make a difference - you're asking SQL to find the MAX value from this field for each PointID, and SQL has only the clustered index to do it with. Significantly better than a table scan, but still not optimal.

Also, the sub-query you're joining to is running once for each row - you can eliminate it using the following modification:

 ;WITH PointDataDTTMMax (PointID, PointDTTM)
    AS (SELECT PointID, MAX(PointDTTM)
          FROM PointData
      GROUP BY PointID)
SELECT ...

This will use a CTE (common table expression) and only execute that aggregate query one time.

rwmnau
A: 

Either include the PointValue in the nonclustered index so it becomes covering (is it even used in your execution plan?) or change the clustered index to make PointDTTM DESC.

Also get rid of the correlated subquery as mentioned in other answers (depending on whether the optimizer is already handling that well)

Cade Roux
A: 

I'd begin by replacing the subquery - I did not try this, hopefully there's no typo:

SELECT dbo.Point.PointDriverID, dbo.Point.AssetID, dbo.Point.PointID, dbo.Point.PointTypeID, dbo.Point.PointName, dbo.Point.ForeignID, dbo.Pointtype.TrendInterval, coalesce(dbo.Point.trendpts,5) AS TrendPts, LastTimeStamp = PointDTTM, LastValue=PointValue, Timezone 
FROM dbo.Point 
    LEFT JOIN dbo.PointType ON dbo.PointType.PointTypeID = dbo.Point.PointTypeID

    INNER JOIN (SELECT dbo.PointData.PointID, Max(dbo.PointData.PointDTTM) AS MaxDT
               FROM dbo.PointData
                   INNER JOIN dbo.Point ON dbo.PointData.PointID = dbo.Point.PointID
               WHERE onlinetrended =1 and WantTrend=1
               GROUP BY dbo.PointData.PointID) f
         ON dbo.Point.PointID = f.PointID
    INNER JOIN dbo.PointData
          ON f.PointID = dbo.PointData.PointID AND f.MaxDT = dbo.PointData.PointDTTM

    LEFT JOIN dbo.SiteAsset ON dbo.SiteAsset.AssetID = dbo.Point.AssetID
    LEFT JOIN dbo.Site ON dbo.Site.SiteID = dbo.SiteAsset.SiteID

Then I'd check whether you can replace some or all of the left joins with inner joins. Does every Point have a PointType? If yes, use inner join. Does every point have at least one PointData? Then use an inner join. Do the same for SiteAsset and Site.

If this isn't sufficient, check the execution plan of the query: Which steps take up most of the execution time? Find the big ones and try to optimize them.

marapet