views:

363

answers:

5

Hi guys

This is a sample of the data that I have.

-ID-     -Rank-  -Type-  -Status- -Amount-
1142474  2  Under Offer Approved  23
1148492  1  Present  Current   56
1148492  2  Under Offer Approved  3
2273605  1  Present  Current  24

Where the ID is the same I only want the record with the highest rank. So the end result of the query.

-ID-     -Rank-  -Type-  -Status- -Amount-
1142474  2  Under Offer Approved  23
1148492  1  Present  Current   56
2273605  1  Present  Current  24

Now to get the original data set is an expensive operation, so I don't want to do a group by the ID and then mins the rank and then joins back onto the dataset again. Hence the query needs to do its work another way.

Cheers Anthony

A: 

Options available in general include:

  • Store the illustrated data in a temp table, then query the temp table.
  • Use a WITH clause to define the complex query, then have the DBMS sort out the query.

The WITH clause effectively allows you to give a name to a sub-query; the optimizer will avoid re-evaluating it if at all possible. The TEMP table solution is likely to be the simplest. And that will do GROUP BY of ID and MIN(rank) and join back.

Jonathan Leffler
+1  A: 
select t1.id
       , t1.rank
       , t1.type
       , t1.status
       , t1.amount

from   my_table t1 

       left outer join my_table as t2 
       on t1.id = t2.id 
    and 
       t2.rank < t1.rank 

where  t2.id is null
Adam Bernier
In this case what is in t2?
vdh_ant
@Anthony: the join to t2 is also known as a self-join. It is another copy of the same table. The reason it works is that we specify in the predicate (the join conditions and the WHERE clause) that we want to exclude everything but the top-ranked item for each id.
Adam Bernier
@adam: The problem is that to get the data out of my_table is very expensive (i.e. between 2 to 6 seconds) hence I would like to avoid joining onto the table again...
vdh_ant
What I have done (even though I really don't want to), I have put the results into a temp table and then joining the temp table onto itself...
vdh_ant
Well, that may be the way to go. Have a look at what another SO user has to say about decomposing your complicated queries into steps: http://stackoverflow.com/questions/754527/best-way-to-test-sql-queries/754570#754570
Adam Bernier
Have you tried Alex's solution yet? http://stackoverflow.com/questions/968305/advanced-grouping-without-using-a-sub-query/968399#968399
Adam Bernier
Yes but the performance difference was very little. But since it simplifies the query I might use it.
vdh_ant
+1  A: 
SELECT * FROM TheTable
WHERE 1 = ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Rank DESC)
Alex Martelli
This was my first instinct. Maybe the OP will post some timing results.
Adam Bernier
Just for the record, in my situation this cases an error - "Windowed functions can only appear in the SELECT or ORDER BY clauses." So I have had to put the over part in a sub query and the where part in the outer query.
vdh_ant
+1  A: 

This will work:

with temp as (
select *, row_number() over (partition by id order by rank) as rownum
from table_name
)
select * from temp where rownum = 1

Will give one record per id where rank represents the least number

Rashmi Pandit
A: 

Why is getting the data set so expensive, I see nothing terribly complex here. Do you have the indexes you need, is the query using them? Are the statistics out of date?

HLGEM
For the purposes of the question I have simplified the scenario. Basically the table is a Table_valued Function which unions results from 2 other Table_valued Functions each which use about 6 temp tables to build up the results. This is due to the level of normalization that is present in the database and how much data need to be derived to build up a picture of the data. Really this data should be captured in a Materialize View or something similar. But I can't make any change like this in this release cycle. cheers
vdh_ant