ansaurus

Question

Select many fields applying DISTINCT to only one particular field

Answer 1

A:

Not sure how to do it using just SQL. You could do the full query (it looks like you want to sort by InsertDate DESC) and then pull out only the ones you want manually.

set s = new set()
while (has more results) {
   var r = next result
   if (!s.contains(r)) {
      process result
      s.add(r)
   }
}

AgileJon 2009-05-31 22:15:17

this will run slower than a pure sql answer, and could return many extra, unnecessary rows

KM 2009-06-01 17:49:01

Answer 2

A:

If you have a fixed set of Actions, you could write a query for the TOP 1 of each, and union the results together:

SELECT TOP 1 [id], [User], [InsertDate] 
FROM [UserActions] 
WHERE [Action] = 'insert' 
ORDER BY [InsertDate] DESC

UNION

SELECT TOP 1 [id], [User], [InsertDate] 
FROM [UserActions] 
WHERE [Action] = 'update' 
ORDER BY [InsertDate] DESC

UNION

SELECT TOP 1 [id], [User], [InsertDate] 
FROM [UserActions] 
WHERE [Action] = 'delete' 
ORDER BY [InsertDate] DESC

Dave Bauman 2009-05-31 22:17:03

TOP 1 with ORDER BY DESC is effectively an aggregate function.

le dorfier 2009-05-31 22:23:50

Answer 3

+4 A:

The inner query should select the max id for each action for the user 'john', the outer query will select those records that match the collection of ids in the inner query so you should only get the last of each action for the specified user.

select id, user, action, insertDate
from userActions
where id in (select max(id)
                 from userActions
                 where user ='john'
                 group by action)

tvanfosson 2009-05-31 22:20:24

? MAX is an aggregate function.

le dorfier 2009-05-31 22:24:28

Excellent! Thanks!

Dawkins 2009-05-31 22:28:44

the result set meets the spec. the OP question was how to get the result set using DISTINCT. the short answer is not to use DISTINCT.

spencer7593 2009-05-31 22:34:09

Answer 4

A:

What's wrong with aggregates? That's what they're there for.

Using aggregates, and assuming that id increases monotonically with insertions (without which, this won't work):

SELECT MAX(id), MAX(User), Action, MAX(InsertDate)
FROM Table
WHERE User = 'john'
GROUP BY Action

Michael Petrotta 2009-05-31 22:27:34

Hi Michael, with that query you don't get the correct results for the fields with Max()

Dawkins 2009-06-01 07:25:00

Well, of course they are correct, but not what I needed :)

Dawkins 2009-06-01 07:30:49

Answer 5

+1 A:

One alternative worth considering (in SQL Server 2008; not sure about SS 2005):

SELECT id, User, Action, InsertDate
FROM Table
WHERE User = 'john'
AND ROW_NUMBER() 
    OVER(PARTITION BY Action ORDER BY InsertDate DESC) 
    = 1

(look, ma, no aggregate functions!-)

Alex Martelli 2009-05-31 22:32:14

If I try this way I get: "Windowed functions can only appear in the SELECT or ORDER BY clauses."

Dawkins 2009-06-01 09:18:57

I thought that was one of the issues back in SS 2005 (whence my "not sure") -- the easy workaround (just a nested SELECT) for SS'05 is at http://weblogs.sqlteam.com/jeffs/archive/2007/03/28/60146.aspx (but I didn't think it was still needed in SS 2008).

Alex Martelli 2009-06-01 19:11:36

Answer 6

+1 A:

Ignoring the OPs need for no aggregate functions (still not sure why...)

The issue I have with the given answer is :

It's not dynamic to allow for any other user - say 'mark'
it assumes that the max(id) for an action will match the latest action - the test data suggests that, but I wouldn't assume that as a rule.

so with those in mind a more dynamic query needs to be built

with 2 more rows added to the test data

 7, john, update, 04/01/09
 8, mark, insert, 01/02/09

the answer doesn't give what the OP wanted

Here's my first draft quickly - will tidy later

select
    userActions.id,
    userActions.[user],
    userActions.Action,
    userActions.insertDate

from
userActions
join
    (
    select
     [user], action, max(insertdate) as maxinsertdate
    from userActions
    group by
     [user], action
    ) aggsubquery
    on userActions.[user] = aggsubquery.[user] 
         and userActions.action = aggsubquery.action 
         and userActions.insertdate = aggsubquery.maxinsertdate

Update....

2nd version uses the ID to get a distinct row where there may be more than one occurance of an action by a particular user, i.e. if the test data also had the following row

 9, john, delete, 06/01/09

then you would need to decide between row id 6 and row id 9 as to which one to return. I arbitrarily chose to use max(id), as I guess the data is important and not the row id

select
    max(userActions.id) as id,
    userActions.[user],
    userActions.Action,
    userActions.insertDate 
from
userActions
join
    (
    select
     [user], action, max(insertdate) as maxinsertdate
    from userActions
    group by
     [user], action
    ) aggsubquery
    on userActions.[user] = aggsubquery.[user] 
     and userActions.action = aggsubquery.action 
     and userActions.insertdate = aggsubquery.maxinsertdate 
group by
    userActions.[user],
    userActions.Action,
    userActions.insertDate

Kev Riley 2009-06-01 16:30:44

ansaurus

tags:

views:

answers:

Select many fields applying DISTINCT to only one particular field

related questions