I have provided three different approaches for solving this problem:
- Using Pivots
- Using Case Statements
- Using inline queries in the where clause
All of the solutions assume we are determining the "most recent" order based on the orderId
column. Using the createDate
column would add complexity due to timestamp collisions and seriously hinder performance since createDate
is probably not part of the indexed key. I have only tested these queries using MS SQL Server 2005, so I have no idea if they will work on your server.
Solutions (1) and (2) perform almost identically. In fact, they both result in the same number of reads from the database.
Solution (3) is not the preferred approach when working with large data sets. It consistently makes hundreds of logical reads more than (1) and (2). When filtering for one specific user, approach (3) is comparable to the other methods. In the single user case, a drop in the cpu time helps to counter the significantly higher number of reads; however, as the disk drive becomes busier and cache misses occur, this slight advantage will disappear.
Conclusion
For the presented scenario, use the pivot approach if it is supported by your DBMS. It requires less code than the case statement and simplifies adding order types in the future.
Please note, in some cases, PIVOT is not flexible enough and characteristic value functions using case statements are the way to go.
Code
Approach (1) using PIVOT:
select
ud.userId, ud.fullname,
od1.orderId as orderId1, od1.createDate as createDate1, od1.orderType as orderType1,
od2.orderId as orderId2, od2.createDate as createDate2, od2.orderType as orderType2
from userData ud
inner join (
select userId, [1] as typeOne, [2] as typeTwo
from (select
userId, orderType, orderId
from orderData) as orders
PIVOT
(
max(orderId)
FOR orderType in ([1], [2])
) as LatestOrders) as LatestOrders on
LatestOrders.userId = ud.userId
inner join orderData od1 on
od1.orderId = LatestOrders.typeOne
inner join orderData od2 on
od2.orderId = LatestOrders.typeTwo
Approach (2) using Case Statements:
select
ud.userId, ud.fullname,
od1.orderId as orderId1, od1.createDate as createDate1, od1.orderType as orderType1,
od2.orderId as orderId2, od2.createDate as createDate2, od2.orderType as orderType2
from userData ud
-- assuming not all users will have orders use outer join
inner join (
select
od.userId,
-- can be null if no orders for type
max (case when orderType = 1
then ORDERID
else null
end) as maxTypeOneOrderId,
-- can be null if no orders for type
max (case when orderType = 2
then ORDERID
else null
end) as maxTypeTwoOrderId
from orderData od
group by userId) as maxOrderKeys on
maxOrderKeys.userId = ud.userId
inner join orderData od1 on
od1.ORDERID = maxTypeTwoOrderId
inner join orderData od2 on
OD2.ORDERID = maxTypeTwoOrderId
Approach (3) using inline queries in the where clause (based on Steve K.'s response):
select ud.userId,ud.fullname,
order1.orderId, order1.orderType, order1.createDate,
order2.orderId, order2.orderType, order2.createDate
from userData ud,
orderData order1,
orderData order2
where ud.userId = order1.userId
and ud.userId = order2.userId
and order1.orderId = (select max(orderId)
from orderData od1
where od1.userId = ud.userId
and od1.orderType = 1)
and order2.orderId = (select max(orderId)
from orderData od2
where od2.userId = ud.userId
and od2.orderType = 2)
Script to generate tables and 1000 users with 100 orders each:
CREATE TABLE [dbo].[orderData](
[orderId] [int] IDENTITY(1,1) NOT NULL,
[createDate] [datetime] NOT NULL,
[orderType] [tinyint] NOT NULL,
[userId] [int] NOT NULL
)
CREATE TABLE [dbo].[userData](
[userId] [int] IDENTITY(1,1) NOT NULL,
[fullname] [nvarchar](50) NOT NULL
)
-- Create 1000 users with 100 order each
declare @userId int
declare @usersAdded int
set @usersAdded = 0
while @usersAdded < 1000
begin
insert into userData (fullname) values ('Mario' + ltrim(str(@usersAdded)))
set @userId = @@identity
declare @orderSetsAdded int
set @orderSetsAdded = 0
while @orderSetsAdded < 10
begin
insert into orderData (userId, createDate, orderType)
values ( @userId, '01-06-08', 1)
insert into orderData (userId, createDate, orderType)
values ( @userId, '01-02-08', 1)
insert into orderData (userId, createDate, orderType)
values ( @userId, '01-08-08', 1)
insert into orderData (userId, createDate, orderType)
values ( @userId, '01-09-08', 1)
insert into orderData (userId, createDate, orderType)
values ( @userId, '01-01-08', 1)
insert into orderData (userId, createDate, orderType)
values ( @userId, '01-06-06', 2)
insert into orderData (userId, createDate, orderType)
values ( @userId, '01-02-02', 2)
insert into orderData (userId, createDate, orderType)
values ( @userId, '01-08-09', 2)
insert into orderData (userId, createDate, orderType)
values ( @userId, '01-09-01', 2)
insert into orderData (userId, createDate, orderType)
values ( @userId, '01-01-04', 2)
set @orderSetsAdded = @orderSetsAdded + 1
end
set @usersAdded = @usersAdded + 1
end
Small snippet for testing query performance on MS SQL Server in addition to SQL Profiler:
-- Uncomment these to clear some caches
--DBCC DROPCLEANBUFFERS
--DBCC FREEPROCCACHE
set statistics io on
set statistics time on
-- INSERT TEST QUERY HERE
set statistics time off
set statistics io off