views:

115

answers:

3

I have a query along the lines of

select b.* from
(select key, max(val) as val from (somequery) group by key) as a
inner join
(somequery) as b
on a.key = b.key and a.val = b.val
order by key

And I was wondering if there is an obvious way (that I am missing) to simplify it (given that somequery might be rather long).

Any thoughts would be appreciated.

+2  A: 

There is, but it's certainly not obvious:

select
    *
from
    (
    select
        key,
        val,
        col,
        max(val) over (partition by key) as MaxVal
    from
        tableA
    )
where
    val = MaxVal

Using the over clause is a great way to do this, and doesn't require any extraneous subqueries. All it does is take the max val per each key, and then wraps that result set in a subquery, where we can check the val against MaxVal to make sure we're pulling the correct row.

Much cleaner and faster than doing up to three subqueries!

Eric
I've not seen MAX..OVER before. How does this work for composite keys?I'm trying on my own table and I can use the agg/join or ROWNUMBER = 1, but not this technique.
gbn
`max(val) over (partition by key1, key2, key3)` works like a charm.
Eric
Thanks for that, Eric.
A: 

I would select your subqueries into a temporary tables before selecting from them. I think you'd see a significant performance boost from that.

Jon
Not my downvote, but...if you select into a temporary table, the DBMS has to store that data, and its metadata, and then use it once. It would be more effecient and effective to give the optimizer the whole query - so it can avoid materializing the sub-query if at all possible. Now, if there are several queries that will use the same sub-query, then you might see a benefit from an explicit temp table - though it would be a good idea to measure and make sure.
Jonathan Leffler
@Jonathan: I agree that in principle what you're saying is correct but my experience with SQL server suggests that for larger tables (say anything larger than 2000 rows) this route is far more efficient.
Jon
A: 

You want to use ROW_NUMBER() or RANK() for this.

(and please make sure that a previous query ends with a semi-colon)

with ranked as
(
select *, row_number() over (partition by key order by val desc) as bestrow
from sometableorquery 
)
select *
from ranked
where bestrow = 1
order by key;

If you want ties (so that a key that has two best values returns both), then use rank() instead of row_number().

Rob

Rob Farley
Also - if you want to get the top 3 of each key, then try "WHERE bestrow <= 3"
Rob Farley
ROW_NUMBER will only return one row per key. Where as the OP's code would ties in the case of there being more than one row for a key with val = max(val). Replacing ROW_NUMBER() with RANK() would preserve the original intention.
Shannon Severance
Thanks Shannon. I mentioned the 'if you want ties' thing in there.
Rob Farley