views:

120

answers:

5

I have this releases table in a SQLite3 database, listing each released version of an application:

|release_id|release_date|app_id|
|==========|============|======|
|      1001| 2009-01-01 |     1|
|      1003| 2009-01-01 |     1|
|      1004| 2009-02-02 |     2|
|      1005| 2009-01-15 |     1|

So for each app_id, there will be multiple rows. I have another table, apps:

|app_id|name    |
|======|========|
|     1|Everest |
|     2|Fuji    |

I want to display the name of the application and the newest release, where "newest" means (a) newest release_date, and if there are duplicates, (b) highest release_id.

I can do this for an individual application:

SELECT apps.name,releases.release_id,releases.release_date 
  FROM apps 
  INNER JOIN releases 
    ON apps.app_id = releases.app_id
  WHERE releases.release_id = 1003
  ORDER BY releases.release_date,releases.release_id
  LIMIT 1

but of course that ORDER BY applies to the whole SELECT query, and if I leave out the WHERE clause, it still returns only one row.

It's a one-shot query on a small database, so slow queries, temp tables, etc. are fine - I just can't get my brain around the SQL way to do this.

A: 

It's ugly, but I think it'll work

select apps.name, (select releases.release_id from releases where releases.app_id=apps.app_id order by releases.release_date, releases.release_id), (select releases.release_date from releases where releases.app_id=apps.app_id order by releases.release_date, releases.release_id) from apps order by apps.app_id

I hope there's some way to get both of those columns in one embedded select, but I don't know it.

JasonWoof
A: 

Try:

SELECT a.name,
       t.max_release_id,
       t.max_date
  FROM APPS a
  JOIN (SELECT t.app_id,
               MAX(t.release_id) 'max_release_id',
               t.max_date
          FROM (SELECT r.app_id,
                       r.release_id,
                       MAX(r.release_date) 'max_date'
                  FROM RELEASES r
              GROUP BY r.app_id, r.release_id)
      GROUP BY t.app_id, t.max_date) t
OMG Ponies
+1  A: 

This is easy to do with the analytic function ROW_NUMBER(), which I guess sqlite3 doesn't support. But you can do it in a way that's a bit more flexible than what's given in the previous answers:

SELECT
  apps.name,
  releases.release_id,
  releases.release_date 
FROM apps INNER JOIN releases 
ON apps.app_id = releases.app_id
WHERE NOT EXISTS (
-- // where there doesn't exist a more recent release for the same app
  SELECT * FROM releases AS R
  WHERE R.app_id = apps.app_id
  AND R.release_data > releases.release_data
)

For example, if you had multiple ordering columns that define "latest," MAX wouldn't work for you, but you could modify the EXISTS subquery to capture the more complicated meaning of "latest."

Steve Kass
I like it, and I even understand it - thanks!
Jay Levitt
+1  A: 

This is the "greatest N per group" problem. It comes up several times per week on StackOverflow.

I usually use a solution like the one in @Steve Kass' answer, but I do it without subqueries (I got into the habit years ago with MySQL 4.0, which didn't support subqueries):

SELECT a.name, r1.release_id, r1.release_date
FROM apps a
INNER JOIN releases r1
LEFT OUTER JOIN releases r2 ON (r1.app_id = r2.app_id 
  AND (r1.release_date < r2.release_date
    OR r1.release_date = r2.release_date AND r1.release_id < r2.release_id))
WHERE r2.release_id IS NULL;

Internally, this probably optimizes identically to the NOT EXISTS syntax. You can analyze the query with EXPLAIN to make sure.


Re your comment, you could just skip the test for release_date because release_id is just as useful for establishing the chronological order of releases, and I assume it's guaranteed to be unique, so this simplifies the query:

SELECT a.name, r1.release_id, r1.release_date
FROM apps a
INNER JOIN releases r1
LEFT OUTER JOIN releases r2 ON (r1.app_id = r2.app_id 
  AND r1.release_id < r2.release_id)
WHERE r2.release_id IS NULL;
Bill Karwin
Turns out that some of the "release dates" are NULL, and this doesn't like those... but this is the closest to my mental representation, so it'll help me understand how these various queries map onto each other. Thanks!
Jay Levitt
A: 

Err second attempt. Assuming that IDs are monotonically increasing and overflow is not a likely occurance, you can ignore the date and just do:

SELECT apps.name, releases.release_id, releases.release_date 
FROM apps INNER JOIN releases on apps.app_id = releases.app_id
WHERE releases.release_id IN 
(SELECT Max(release_id) FROM releases
GROUP BY app_id);
Dean Povey