tags:

views:

516

answers:

6

In SQL Server, how can I achieve selecting many fields (without an agregation function) and apply the DISTINCT statement to only one particular field?

For instance: if I have a table where I store user actions, the pseudo-schema would be like this:

UserActions
------------
id,
User,
Action
insertDate

I want to get the latest actions for a given user without repeating the field 'Action'?

For instance, if the table contents are:

1, john, update, 01/01/09
2, john, update, 01/02/09
3, john, update, 01/03/09
4, john, delete, 01/04/09
5, john, insert, 01/05/09
6, john, delete, 01/06/09

I would like to get:

6, john, delete, 01/06/09
5, john, insert, 01/05/09
3, john, update, 01/03/09

Many thanks in advance.

A: 

Not sure how to do it using just SQL. You could do the full query (it looks like you want to sort by InsertDate DESC) and then pull out only the ones you want manually.

set s = new set()
while (has more results) {
   var r = next result
   if (!s.contains(r)) {
      process result
      s.add(r)
   }
}
AgileJon
this will run slower than a pure sql answer, and could return many extra, unnecessary rows
KM
A: 

If you have a fixed set of Actions, you could write a query for the TOP 1 of each, and union the results together:

SELECT TOP 1 [id], [User], [InsertDate] 
FROM [UserActions] 
WHERE [Action] = 'insert' 
ORDER BY [InsertDate] DESC

UNION

SELECT TOP 1 [id], [User], [InsertDate] 
FROM [UserActions] 
WHERE [Action] = 'update' 
ORDER BY [InsertDate] DESC

UNION

SELECT TOP 1 [id], [User], [InsertDate] 
FROM [UserActions] 
WHERE [Action] = 'delete' 
ORDER BY [InsertDate] DESC
Dave Bauman
TOP 1 with ORDER BY DESC is effectively an aggregate function.
le dorfier
+4  A: 

The inner query should select the max id for each action for the user 'john', the outer query will select those records that match the collection of ids in the inner query so you should only get the last of each action for the specified user.

select id, user, action, insertDate
from userActions
where id in (select max(id)
                 from userActions
                 where user ='john'
                 group by action)
tvanfosson
? MAX is an aggregate function.
le dorfier
Excellent! Thanks!
Dawkins
the result set meets the spec. the OP question was how to get the result set using DISTINCT. the short answer is not to use DISTINCT.
spencer7593
A: 

What's wrong with aggregates? That's what they're there for.

Using aggregates, and assuming that id increases monotonically with insertions (without which, this won't work):

SELECT MAX(id), MAX(User), Action, MAX(InsertDate)
FROM Table
WHERE User = 'john'
GROUP BY Action
Michael Petrotta
Hi Michael, with that query you don't get the correct results for the fields with Max()
Dawkins
Well, of course they are correct, but not what I needed :)
Dawkins
+1  A: 

One alternative worth considering (in SQL Server 2008; not sure about SS 2005):

SELECT id, User, Action, InsertDate
FROM Table
WHERE User = 'john'
AND ROW_NUMBER() 
    OVER(PARTITION BY Action ORDER BY InsertDate DESC) 
    = 1

(look, ma, no aggregate functions!-)

Alex Martelli
If I try this way I get: "Windowed functions can only appear in the SELECT or ORDER BY clauses."
Dawkins
I thought that was one of the issues back in SS 2005 (whence my "not sure") -- the easy workaround (just a nested SELECT) for SS'05 is at http://weblogs.sqlteam.com/jeffs/archive/2007/03/28/60146.aspx (but I didn't think it was still needed in SS 2008).
Alex Martelli
+1  A: 

Ignoring the OPs need for no aggregate functions (still not sure why...)

The issue I have with the given answer is :

  1. It's not dynamic to allow for any other user - say 'mark'
  2. it assumes that the max(id) for an action will match the latest action - the test data suggests that, but I wouldn't assume that as a rule.

so with those in mind a more dynamic query needs to be built

with 2 more rows added to the test data

 7, john, update, 04/01/09
 8, mark, insert, 01/02/09

the answer doesn't give what the OP wanted

Here's my first draft quickly - will tidy later

select
    userActions.id,
    userActions.[user],
    userActions.Action,
    userActions.insertDate

from
userActions
join
    (
    select
     [user], action, max(insertdate) as maxinsertdate
    from userActions
    group by
     [user], action
    ) aggsubquery
    on userActions.[user] = aggsubquery.[user] 
         and userActions.action = aggsubquery.action 
         and userActions.insertdate = aggsubquery.maxinsertdate

Update....

2nd version uses the ID to get a distinct row where there may be more than one occurance of an action by a particular user, i.e. if the test data also had the following row

 9, john, delete, 06/01/09

then you would need to decide between row id 6 and row id 9 as to which one to return. I arbitrarily chose to use max(id), as I guess the data is important and not the row id

select
    max(userActions.id) as id,
    userActions.[user],
    userActions.Action,
    userActions.insertDate 
from
userActions
join
    (
    select
     [user], action, max(insertdate) as maxinsertdate
    from userActions
    group by
     [user], action
    ) aggsubquery
    on userActions.[user] = aggsubquery.[user] 
     and userActions.action = aggsubquery.action 
     and userActions.insertdate = aggsubquery.maxinsertdate 
group by
    userActions.[user],
    userActions.Action,
    userActions.insertDate
Kev Riley