views:

538

answers:

2

UPDATE: Thanks to Sifu Bill's advice I have amended the SQL query. Now it returns the correct number of distinct assets (five).

Is it possible to rewrite the following correlated subquery as a JOIN?

SELECT TOP 100 PERCENT Asset_ID, work_order_id, status_id,
downtime_hours, date_completed
FROM dbo.mtvw_wo_reason1
WHERE (Asset_ID IN
(SELECT TOP 5 Asset_ID from mtvw_wo_reason1
WHERE (Date_Completed BETWEEN '1-Oct-2009' AND '31-Oct-2009')
GROUP BY Asset_ID
ORDER BY SUM(Downtime_hours) DESC))

The output follows:

Asset_ID work_order_id status_id downtime_hours Date_Completed

SFM2019 2009-724 Completed .5000 2009-09-01 17:55:09.000
SFM2019 2009-761 Completed .5000 2009-09-15 09:03:39.000
SFM2019 2009-828 Completed 8.0000 2009-10-02 00:00:00.000
SFM2043 2009-683 Completed .5000 2009-08-14 00:00:00.000
SFM2043 2009-734 Completed 1.0000 2009-09-05 00:54:33.000
SFM2043 2009-741 Completed 1.0000 2009-09-08 17:05:09.000
SFM2043 2009-756 Completed .5000 2009-09-14 07:56:41.000
SFM2043 2009-792 Completed 1.0000 2009-09-22 00:00:00.000
SFM2043 2009-826 Completed 8.0000 2009-10-03 02:00:19.130
SFM2043 2009-983 Completed 1.0000 2009-10-30 00:00:00.000
SFM2045 2009-794 Completed 17.0000 2009-09-24 00:00:00.000
SFM2045 2009-808 Completed 1.0000 2009-09-26 16:01:25.850
SFM2045 2009-811 Completed 3.0000 2009-09-27 09:47:45.543
SFM2045 2009-816 Completed 24.0000 2009-09-30 15:14:35.000
SFM2045 2009-858 Completed 3.0000 2009-10-12 00:00:00.000
SFM2045 2009-861 Completed .5000 2009-10-13 01:11:50.900
SFM2045 2009-975 Completed 1.0000 2009-10-28 00:00:00.000
SFM2045 2009-984 Completed 3.0000 2009-10-30 00:00:00.000
SFM2088 2009-777 Completed .5000 2009-10-01 12:49:57.000
SFM2088 2009-853 Completed .5000 2009-10-09 00:00:00.000
SFM2088 2009-855 Completed 10.0000 2009-10-09 00:00:00.000
SFM2200 2009-753 Completed 8.0000 2009-09-11 00:00:00.000
SFM2200 2009-830 Completed 8.0000 2009-10-09 00:00:00.000

In essence the query needs to:

Return the top 5 assets with the highest cumulative downtime for the month of October, and a breakdown of the associated workorders for these assets

The query has to be rewritten in such a way that does not require the the existence of Date_Completed criteria in the inner query (is this even possible?). Using Sifu Bill's answer yielded only 11 rows (there should be 23):

Output from Sifu Bill's query:

Asset_ID work_order_id status_id downtime_hours Date_Completed

SFM2045 2009-858 Completed 3.0000 2009-10-12 00:00:00.000
SFM2045 2009-861 Completed .5000 2009-10-13 01:11:50.900
SFM2045 2009-975 Completed 1.0000 2009-10-28 00:00:00.000
SFM2045 2009-984 Completed 3.0000 2009-10-30 00:00:00.000
SFM2200 2009-830 Completed 8.0000 2009-10-09 00:00:00.000
SFM2043 2009-826 Completed 8.0000 2009-10-03 02:00:19.130
SFM2043 2009-983 Completed 1.0000 2009-10-30 00:00:00.000
SFM2088 2009-777 Completed .5000 2009-10-01 12:49:57.000
SFM2088 2009-853 Completed .5000 2009-10-09 00:00:00.000
SFM2088 2009-855 Completed 10.0000 2009-10-09 00:00:00.000
SFM2019 2009-828 Completed 8.0000 2009-10-02 00:00:00.000

Thank you for any insight.

+2  A: 

You can try

SELECT  TOP 100 PERCENT 
     Asset_ID, 
     work_order_id, 
     status_id, 
     downtime_hours 
FROM    dbo.mtvw_wo_reason1 INNER JOIN
     (
      SELECT TOP 5 
        Asset_ID 
      from mtvw_wo_reason1 
      GROUP BY Asset_ID, Date_Completed 
      HAVING Date_Completed BETWEEN '1-Oct-2009' and '31-Oct-2009' 
      ORDER BY SUM(Downtime_hours) DESC
     ) sub ON dbo.mtvw_wo_reason1.Asset_ID = sub.Asset_ID

EDIT for comment:

SELECT  TOP 100 PERCENT         
     Asset_ID,         
     work_order_id,         
     status_id,         
     downtime_hours ,
     sub.Date_Completed --added
FROM    dbo.mtvw_wo_reason1 INNER JOIN        
     (                
      SELECT  TOP 5                                 
        Asset_ID,
        Date_Completed --added
      from    mtvw_wo_reason1                 
      GROUP   BY Asset_ID, Date_Completed                 
      HAVING  Date_Completed BETWEEN '1-Oct-2009' and '31-Oct-2009'                 
      ORDER BY SUM(Downtime_hours) DESC        
     ) sub ON dbo.mtvw_wo_reason1.Asset_ID = sub.Asset_ID
astander
That worked a treat! It needed a slight modification to specify the table name though. Thanks!
Greenhorn
Sifu astander, is it possible for the Date_Completed column from the subquery to be selected on the main (outside) query? Thank you for the enlightenment.
Greenhorn
Yes you can, see edit answer
astander
Sifu astander, sorry for my incompetence as the problem was not explained clearly.I wish to remove the "HAVING" clause within the subquery. The reason for it is because I'm stuck using a crippled reporting tool that requires column names to be exposed in the main SELECT command.Thank you for your patience.
Greenhorn
A: 

It's hard to guess what you intend this query to do. I suggest that when you ask how to fix a query that isn't doing what you want, you include in your question a clear statement about what the purpose of the query is. Otherwise people who answer can't infer from a wrong query what would be the right one.

Also it's helpful if you include the definition of the table. The best way to show this is the Data Definition Language (i.e. CREATE TABLE) that you used to define the table.

I'm assuming this statement describes your desired query result:

"All rows during October for each Asset_ID's that have the most cumulative downtime during that month."

You can get the assets that have the most downtime during October like you were doing in the subquery:

SELECT TOP 5 Asset_ID 
FROM mtvw_wo_reason1
WHERE Date_Completed BETWEEN '2009-10-1' and '2009-10-31'
GROUP  BY Asset_ID
ORDER BY SUM(Downtime_hours) DESC

Notice that this query could be run standalone. This is a clue that if you use it as a subquery, it's non-correlated. That is, it contains no reference to values in an outer query.

You don't want to add Date_Completed to the GROUP BY clause as @astander showed. You want groups to be per asset alone, so you can get accurate SUM() per asset.

However, the select-list of this table doesn't include the other columns from the table, and the rows are reduced to a single row per Asset_ID by the GROUP BY. So it doesn't give all the information you need. Therefore you need to run it as a subquery.

SELECT m.Asset_ID, m.work_order_id, m.status_id, m.downtime_hours, m.Date_Completed
FROM dbo.mtvw_wo_reason1 m
JOIN (SELECT TOP 5 Asset_ID 
    FROM mtvw_wo_reason1
    WHERE Date_Completed BETWEEN '2009-10-1' and '2009-10-31'
    GROUP  BY Asset_ID
    ORDER BY SUM(Downtime_hours) DESC) sub
  ON (m.Asset_ID = sub.Asset_ID)
WHERE m.Date_Completed BETWEEN '2009-10-1' and '2009-10-31';

You do need to restrict by Date_Completed in both the subquery and the outer query.

PS: From what I've read, the only valid use of TOP 100 PERCENT is when you're defining a VIEW with an ORDER BY clause at the end.

Bill Karwin
Sifu Bill, thank you for taking time to reply. I learned a lot from it.The SQL query needs to return the top 5 assets with the highest cumulative downtime for the month of October, and a breakdown of the associated workorders for these assets. I will edit the original question to provide more detail. Thank you for your patience.
Greenhorn
Thanks for the extra detail in your question. I can't think of a straightforward way to solve this query without the condition in the subquery for Date_Completed.
Bill Karwin
The output you showed includes rows outside of October for the assets. I had assumed this would not be desired -- do you want all rows for these assets to the beginning of time? Anyway, if you remove the `WHERE` clause on the *outer* query, you should get all the rows for these assets.
Bill Karwin
Sifu Bill, thank you for pointing out the mistake. The SQL query itself is correct; the view was selecting the column from the wrong table. There are too many columns with the same name in different tables. Sadly I do not have the authority to modify the database schema. :(
Greenhorn