views:

549

answers:

4

Hello,

I have a performance issue when selecting data in my project.

There is a table with 3 columns: "id","time" and "group"

  • The ids are just unique ids as usual.
  • The time is the creation date of the entry.
  • The group is there to cummulate certain entries together.

So the table data may look like this:

ID | TIME      | GROUP
------------------------
1  | 20090805  | A
2  | 20090804  | A
3  | 20090804  | B
4  | 20090805  | B
5  | 20090803  | A
6  | 20090802  | B

...and so on.

The task is now to select the "current" entries (their ids) in each group for a given date. That is, for each group find the most recent entry for a given date.

Following preconditions apply:

  • I do not know the different groups in advance - there may be many different ones changing over time
  • The selection date may lie "in between" the dates of the entries in the table. Then I have to find the closest one in each group. That is, TIME is less than the selection date but the maximum of those to which this rule applies in a group.

What I currently do is a multi-step process which I would like to change into single SELECT statement:

  1. SELECT DISTINCT group FROM table to find the available groups
  2. For each group found in 1), SELECT * FROM table WHERE time<selectionDate AND group=loop ORDER BY time DESC
  3. Take the first row of each result found in 2)

Obviously this is not optimal.

So I would be very happy if some more experienced SQL expert could help me to find a solution to put these steps in a single statement.

Thank you!

+4  A: 

Here's how I would do it in SQL Server:

SELECT * FROM table WHERE id in
(SELECT top 1 id FROM table WHERE time<selectionDate GROUP BY [group] ORDER BY [time])
Mark Ransom
+1  A: 

The solution will vary by database server, since the syntax for TOP queries varies. Basically you are looking for a "top n per group" query, so you can Google that if you want.

Here is a solution in SQL Server. The following will return the top 10 players who hit the most home runs per year since 1990. The key is to calculate the "Home Run Rank" of each player for each year.

select 
  HRRanks.*
from
(
    Select 
      b.yearID, b.PlayerID, sum(b.Hr) as TotalHR,
      rank() over (partition by b.yearID order by sum(b.hr) desc) as HR_Rank
    from 
      Batting b
    where 
      b.yearID > 1990
    group by 
      b.yearID, b.playerID
) 
  HRRanks
where
  HRRanks.HR_Rank <= 10

Here is a solution in Oracle (Top Salespeople per Department)

SELECT deptno, avg_sal
FROM( 
      SELECT deptno, AVG(sal) avg_sal
      GROUP BY deptno
      ORDER BY AVG(sal) DESC
    )
WHERE ROWNUM <= 10;

Or using analytic functions:

SELECT deptno, avg_sal
FROM (
       SELECT deptno, avg_sal, RANK() OVER (ORDER BY sal DESC) rank
       FROM
       (
         SELECT deptno, AVG(sal) avg_sal
         FROM emp
         GROUP BY deptno
       )
     )
WHERE rank <= 10;

Or same again, but using DENSE_RANK() instead of RANK()

Robert Harvey
+9  A: 

The following will work on SQL Server 2005+ and Oracle 9i+:

WITH groups AS (
       SELECT t.group,
              MAX(t.time) 'maxtime'
         FROM TABLE t
     GROUP BY t.group)
SELECT t.id,
       t.time,
       t.group
  FROM TABLE t
  JOIN groups g ON g.group = t.group AND g.maxtime = t.time

Any database should support:

SELECT t.id,
       t.time,
       t.group
  FROM TABLE t
  JOIN (SELECT t.group,
               MAX(t.time) 'maxtime'
          FROM TABLE t
      GROUP BY t.group) g ON g.group = t.group AND g.maxtime = t.time
OMG Ponies
+1. Quite like the second version, though it assumes that a group appears only once per "time". Thilo had had an equivalent solution to your second query, using `WHERE ... IN *subquery*`, but that seems to have been deleted.
pilcrow
+1: I think I will go for your second solution. First tests seem promising. THANKS AGAIN to you and all the others helping me so quickly and professionally. THANK YOU!
Thorsten
A: 
select * from TABLE where (GROUP, TIME) in (
    select GROUP, max(TIME) from things
        where TIME >= 20090804
        group by GROUP
    )

Tested with MySQL (but I had to change the table and column names because they are keywords).

Lucky