views:

60

answers:

6

EDIT: my apologies, this was a MSSQL2008 issue.

I have a SQL problem that I've come up against routinely, and normally just solved w/ a nested query. I'm hoping someone can suggest a more elegant solution.

It often happens that I need to select a result set for a user, conditioned upon it being the most recent, or the most sizeable or whatever.

For example: Their complete list of pages created, but I only want the most recent name they applied to a page. It so happens that the database contains many entries for each page, and only the most recent one is desired.

I've been using a nested select like:

SELECT pg.customName, pg.id
FROM (
     select id, max(createdAt) as mostRecent
     from pages
     where userId = @UserId
     GROUP BY id
) as MostRecentPages
JOIN pages pg
ON pg.id = MostRecentPages.id
AND pg.createdAt = MostRecentPages.mostRecent

Is there a better syntax to perform this selection?

A: 

I don't know what your table looks like

Select top 1  pg.createdAt
             ,pg.customName
             ,pg.id
from          table pg
where         pg.UserId = @UserId
order by      pg.createdAt Desc

I need a bit more info on your table(s)

Darknight
A: 

Are you using Oracle? Try to see if this query that uses analytic function would work for you. (Don't have access to db right now, so can't test myself.)

SELECT DISTINCT pg.id, 
FIRST_VALUE(pg.customName) OVER (PARTITION BY pg.id ORDER BY pg.createdAt DESC) AS customName
FROM pages pg
Tommi
assuming pg.id is an identity column is the 'select distinct' needed? or is it because of the grouping?
Darknight
Looking at the original author's query, pg.id doesn't seem to be an identity columnn.
Tommi
A: 

Assuming SQL Server and your Pages table like so:

CREATE TABLE Pages (
    Id int IDENTITY(1, 1) PRIMARY KEY
    , CustomName nvarchar(20) NOT NULL
    , CreatedAt datetime NOT NULL DEFAULT GETDATE()
    , UserId int references Users(Id)
)

I would do the following:

select TOP 1 p.Id as PageId
        , p.CustomName
    from Pages p
    where p.UserId = @UserId
    order by p.Created desc

Or even:

select TOP 1 p.Id as PageId
        , p.CustomName
        , MAX(p.CreatedAt) DateTimeCreated
    from Pages p
    where p.UserId = @UserId
    group by p.Id
        , p.CustomName

I hope this helps! (If not, please provide further details so that we may be of better helping hand)

Will Marcouiller
These won't work. The first one will just get the top id for the user - not per page per user. The second one will complain because you have mixed an aggregate with column references and have no GROUP BY.
Martin Smith
Thanks! I had forgotten to include the *group by* to my query. I edited.
Will Marcouiller
+1  A: 

I'm not sure about better but a different syntax you could try is

SELECT pg.customName, pg.id
FROM pages pg
WHERE userId = @UserId  
AND NOT EXISTS 
        (
         SELECT * FROM pages pg2
         WHERE pg2.UserId = pg.UserId
         AND pg2.id = pg.id
         AND pg2.createdAt > pg.createdAt
         )

Another alternative would be an OUTER JOIN as in Bill Karwin's answer here http://stackoverflow.com/questions/2893903/how-to-get-all-the-fields-of-a-row-using-the-sql-max-function/2893948#2893948

Martin Smith
+1  A: 

For what database (including version)? What you posted could be MySQL, SQL Server, or Sybase...

Using:

SELECT pg.customName, 
       pg.id
  FROM PAGES pg
  JOIN (SELECT t.userid, 
               MAX(t.createdAt) as mostRecent
          FROM PAGES t
      GROUP BY t.userid) x ON x.id = pg.id
                          AND x.mostRecent = pg.createdAt
                          AND x.userid = @UserId 

This is the best approach for a portable query, assuming column references are correct. But there are alternatives for limiting the data set - SQL Server uses TOP, MySQL/Postgre/SQLite use LIMIT, Oracle uses ROWNUM.

What's best depends on your data & how the respective optimizer sees it, and your needs (portable vs not). Check the explain plan for the respective database to see how efficient the query is.

OMG Ponies
It is MSSQL 2008 - so how does this nested query differ from the one I posted, other than the parameterized predicate beign outside the nested query? Does this improve performance? I suppose it depends on the indexing of the table.
jkelley
+2  A: 

Looks like you want

SELECT id, customname
FROM (SELECT id, customname,
             row_number() OVER(PARTITION BY id ORDER BY createdat DESC) as pos
      FROM pages
      WHERE pages.userid = @UserId
     ) x
WHERE x.row_number = 1

(I'm assuming you're using SQL Server from the @UserId parameter. row_number() will work for SQL Server 2005, and tbh the above should also work for Oracle, Postgresql 8.4...)

This will select all the pages by userid and work out which is the most recent using a sort. An alternative would be sth like:

SELECT id, (SELECT TOP 1 customname
            FROM pages pages_inner
            WHERE pages_inner.id = pages_outer.id
            ORDER BY pages_inner.createdat DESC) as customname
FROM (SELECT DISTINCT id FROM pages WHERE pages.userid = @UserId) pages_inner

Which approach is better depends on how many pages rows per id you have compared to pages per userid, I guess.

araqnid