views:

1132

answers:

7

OK I have a table like this:

ID     Signal    Station    OwnerID
111     -120      Home       1
111     -130      Car        1
111     -135      Work       2
222     -98       Home       2
222     -95       Work       1
222     -103      Work       2

This is all for the same day. I just need the Query to return the max signal for each ID:

ID    Signal    Station    OwnerID
111   -120      Home        1
222   -95       Work        1

I tried using MAX() and the aggregation messes up with the Station and OwnerID being different for each record. Do I need to do a JOIN?

+1  A: 

Something like this? Join your table with itself, and exclude the rows for which a higher signal was found.

select cur.id, cur.signal, cur.station, cur.ownerid
from yourtable cur
where not exists (
    select * 
    from yourtable high 
    where high.id = cur.id 
    and high.signal > cur.signal
)

This would list one row for each highest signal, so there might be multiple rows per id.

Andomar
Yeah this does return duplicates if the Signal is the same for multiple Stations.
Nick S.
Edited so you get multiple rows per signal, but no duplicates. Use Quassnoi's answer if you only want a random row from among those with the highest signal.
Andomar
Yes i think this is working. I need to check the data. But thanks a lot.
Nick S.
A: 
WITH q AS
         (
         SELECT  c.*, ROW_NUMBER() OVER (PARTITION BY id ORDER BY signal DESC) rn
         FROM    mytable
         )
SELECT   *
FROM     q
WHERE    rn = 1

This will return one row even if there are duplicates of MAX(signal) for a given ID.

Having an index on (id, signal) will greatly improve this query.

Quassnoi
Better to use the aggregate and jon method than creating a column. The optimiser can evaluate is as a whole: the computed column here needs calculated first so this more than likely needs a spool somewhere
gbn
If you have an index on this column (which you should), the join will be less efficient.
Quassnoi
+ not for SQL Server 200 just in case
gbn
I know, but with an index it's more efficient for SQL Server 2005.
Quassnoi
Good to know. I just tested with 13k rows and 300k rows tables. IO is less, better for smaller table
gbn
IO is less on what solution?
Quassnoi
Yours. Less logical reads, less scans. However, the % of batch shows a lot worse for larger table... but I tend to prefer IO stats. I did not check CPU though
gbn
A: 

If you just want the max signal per ID, then why do you need station and owner?

SELECT ID, MAX(Signal)
FROM t
GROUP BY ID

If you need the Station and OwnerID then you could do

SELECT ID, MAX(Signal), FIRST(Station), FIRST(OwnerID)
FROM t
GROUP BY ID
Winston Smith
Sorry I do need the Station and OwnerID I just need it to based off of the Signal.
Nick S.
Joe your second one won't guarantee that Station and ownerid are the same ones that belong with the max(signal) so it gives the wrong answer
HLGEM
this is true. they don't match up. thanks though.
Nick S.
FIRST is not a SQL Server aggregate function
gbn
A: 

Here is my suggestion...

SELECT Id, MAX(Signal), Station, OwnerId FROM yourtable GROUP BY Id, Station, OwnerID
Sohnee
This returns multiple IDs
Nick S.
This is not a valid solution; it will return all 6 example rows.
Jonathan Leffler
I thought the requirement was for a single row per id, with just the max value. Maybe I misunderstood.
Sohnee
A: 
select a.id, b.signal, a.station, a.owner from 
mytable a
join 
(SELECT ID, MAX(Signal) as Signal FROM mytable GROUP BY ID) b
on a.id = b.id AND a.Signal = b.Signal
HLGEM
I get a "syntax error in from clause" error
Nick S.
@thegreekness: do you need to include an explicit AS between the table aliases? mytable AS a JOIN (SELECT ...) AS b? You shouldn't, but...
Jonathan Leffler
I've just realized - the ON condition must specify a join on signal too.
Jonathan Leffler
+1  A: 

In classic SQL-92 (not using the OLAP operations used by Quassnoi), then you can use:

SELECT g.ID, g.MaxSignal, t.Station, t.OwnerID
  FROM (SELECT id, MAX(Signal) AS MaxSignal
          FROM t
          GROUP BY id) AS g
       JOIN t ON g.id = t.id AND g.MaxSignal = t.Signal;

(Unchecked syntax; assumes your table is 't'.)

The sub-query in the FROM clause identifies the maximum signal value for each id; the join combines that with the corresponding data row from the main table.

NB: if there are several entries for a specific ID that all have the same signal strength and that strength is the MAX(), then you will get several output rows for that ID.


Tested against IBM Informix Dynamic Server 11.50.FC3 running on Solaris 10:

+ CREATE TEMP TABLE signal_info
(
    id      INTEGER NOT NULL,
    signal  INTEGER NOT NULL,
    station CHAR(5) NOT NULL,
    ownerid INTEGER NOT NULL
);
+ INSERT INTO signal_info VALUES(111, -120, 'Home', 1);
+ INSERT INTO signal_info VALUES(111, -130, 'Car' , 1);
+ INSERT INTO signal_info VALUES(111, -135, 'Work', 2);
+ INSERT INTO signal_info VALUES(222, -98 , 'Home', 2);
+ INSERT INTO signal_info VALUES(222, -95 , 'Work', 1);
+ INSERT INTO signal_info VALUES(222, -103, 'Work', 2);
+ SELECT g.ID, g.MaxSignal, t.Station, t.OwnerID
  FROM (SELECT id, MAX(Signal) AS MaxSignal
            FROM signal_info
            GROUP BY id) AS g
      JOIN signal_info AS t  ON g.id = t.id AND g.MaxSignal = t.Signal;

111     -120    Home    1
222     -95     Work    1

I named the table Signal_Info for this test - but it seems to produce the right answer. This only shows that there is at least one DBMS that supports the notation. However, I am a little surprised that MS SQL Server does not - which version are you using?


It never ceases to surprise me how often SQL questions are submitted without table names.

Jonathan Leffler
I get a "Syntax Error in FROM clause" error and it's pointing to the JOIN
Nick S.
+1  A: 

You are doing a group-wise maximum/minimum operation. This is a common trap: it feels like something that should be easy to do, but in SQL it aggravatingly isn't.

There are a number of approaches (both standard ANSI and vendor-specific) to this problem, most of which are sub-optimal in many situations. Some will give you multiple rows when more than one row shares the same maximum/minimum value; some won't. Some work well on tables with a small number of groups; others are more efficient for a larger number of groups with smaller rows per group.

Here's a discussion of some of the common ones (MySQL-biased but generally applicable). Personally, if I know there are no multiple maxima (or don't care about getting them) I often tend towards the null-left-self-join method, which I'll post as no-one else has yet:

SELECT reading.ID, reading.Signal, reading.Station, reading.OwnerID
FROM readings AS reading
LEFT JOIN readings AS highersignal
    ON highersignal.ID=reading.ID AND highersignal.Signal>reading.Signal
WHERE highersignal.ID IS NULL;
bobince