views:

295

answers:

11

I frequently run into problems of this form and haven't found a good solution yet:

Assume we have two database tables representing an e-commerce system.

userData (userId, name, ...)
orderData (orderId, userId, orderType, createDate, ...)

For all users in the system, select their user information, their most recent order information with type = '1', and their most recent order information with type = '2'. I want to do this in one query. Here is an example result:

(userId, name, ..., orderId1, orderType1, createDate1, ..., orderId2, orderType2, createDate2, ...)
(101, 'Bob', ..., 472, '1', '4/25/2008', ..., 382, '2', '3/2/2008', ...)
A: 

Their newest you mean all new in the current day? You can always check with your createDate and get all user and order data if the createDate >= current day.

SELECT * FROM
"orderData", "userData"
WHERE
"userData"."userId"  ="orderData"."userId"
AND "orderData".createDate >= current_date;

UPDATED

Here is what you want after your comment here:

SELECT * FROM
"orderData", "userData"
WHERE
"userData"."userId"  ="orderData"."userId"
AND "orderData".type = '1'
AND "orderData"."orderId" = (
SELECT "orderId" FROM "orderData"
WHERE 
"orderType" = '1'
ORDER "orderId" DESC
LIMIT 1

)

Daok
No, I mean each user might have 0...n orders. Find the most recent one in their collection of orders, specifically where you can limit the type.
JavadocMD
A: 

You might be able to do a union query for this. The exact syntax needs some work, especially the group by section, but the union should be able to do it.

For example:

SELECT orderId, orderType, createDate
FROM orderData
WHERE type=1 AND MAX(createDate)
GROUP BY orderId, orderType, createDate

UNION

SELECT orderId, orderType, createDate
FROM orderData
WHERE type=2 AND MAX(createDate)
GROUP BY orderId, orderType, createDate
Kevin Lamb
No, this gets me two rows which are simply the most recent orders in the two categories (regardless of which user ordered them). I want one row for each user.
JavadocMD
Man you look picky, I think with what Kevin and I have gave you, you should be able to tweak our queries to be able to have exact what you want.
Daok
Not picky, just trying to solve a particular problem. :)
JavadocMD
+1  A: 

Sorry I don't have oracle in front of me, but this is the basic structure of what I would do in oracle:

SELECT b.user_id, b.orderid, b.orderType, b.createDate, <etc>,
       a.name
FROM orderData b, userData a
WHERE a.userid = b.userid
AND (b.userid, b.orderType, b.createDate) IN (
  SELECT userid, orderType, max(createDate) 
  FROM orderData 
  WHERE orderType IN (1,2)
  GROUP BY userid, orderType)
hamishmcn
sorry, I see from your edit that you want to have the type 1 and type 2 data on the same row
hamishmcn
+1  A: 

T-SQL sample solution (MS SQL):

SELECT
    u.*
    , o1.*
    , o2.* 
FROM
(
    SELECT
     , userData.*
     , (SELECT TOP 1 orderId.url FROM orderData WHERE orderData.userId=userData.userId AND orderType=1 ORDER BY createDate DESC)
      AS order1Id
     , (SELECT TOP 1 orderId.url FROM orderData WHERE orderData.userId=userData.userId AND orderType=2 ORDER BY createDate DESC)
      AS order2Id
    FROM userData
) AS u
LEFT JOIN orderData o1 ON (u.order1Id=o1.orderId)
LEFT JOIN orderData o2 ON (u.order2Id=o2.orderId)

In SQL 2005 you could also use RANK ( ) OVER function. (But AFAIK its completely MSSQL-specific feature)

Bartek Szabat
A: 

i use things like this in MySQL:

SELECT
   u.*,
   SUBSTRING_INDEX( MAX( CONCAT( o1.createDate, '##', o1.otherfield)), '##', -1) as o2_orderfield,
   SUBSTRING_INDEX( MAX( CONCAT( o2.createDate, '##', o2.otherfield)), '##', -1) as o2_orderfield
FROM
   userData as u
   LEFT JOIN orderData AS o1 ON (o1.userId=u.userId AND o1.orderType=1)
   LEFT JOIN orderData AS o2 ON (o1.userId=u.userId AND o2.orderType=2)
GROUP BY u.userId

In short, use MAX() to get the newest, by prepending the criteria field (createDate) to the interesting field(s) (otherfield). SUBSTRING_INDEX() then strips off the date.

OTOH, if you need an arbitrary number of orders (if userType can be any number, and not a limited ENUM); it's better to handle with a separate query, something like this:

select * from orderData where userId=XXX order by orderType, date desc group by orderType

for each user.

Javier
A: 

Assuming orderId is monotonic increasing with time:

SELECT *
FROM userData u
INNER JOIN orderData o
  ON o.userId = u.userId
INNER JOIN ( -- This subquery gives the last order of each type for each customer
  SELECT MAX(o2.orderId)
    --, o2.userId -- optional - include if joining for a particular customer
    --, o2.orderType -- optional - include if joining for a particular type
  FROM orderData o2
  GROUP BY o2.userId
    ,o2.orderType
) AS LastOrders
  ON LastOrders.orderId = o.orderId -- expand join to include customer or type if desired

Then pivot at the client or if using SQL Server, there is a PIVOT functionality

Cade Roux
Bonus points for use of "monotonic increasing". :)
JavadocMD
+3  A: 

This should work, you'll have to adjust the table / column names:

select ud.name,
       order1.order_id,
       order1.order_type,
       order1.create_date,
       order2.order_id,
       order2.order_type,
       order2.create_date
  from user_data ud,
       order_data order1,
       order_data order2
 where ud.user_id = order1.user_id
   and ud.user_id = order2.user_id
   and order1.order_id = (select max(order_id)
                            from order_data od1
                           where od1.user_id = ud.user_id
                             and od1.order_type = 'Type1')
   and order2.order_id = (select max(order_id)
                             from order_data od2
                            where od2.user_id = ud.user_id
                              and od2.order_type = 'Type2')

Denormalizing your data might also be a good idea. This type of thing will be fairly expensive to do. So you might add a last_order_date to your userData.

Steve K
A: 

Here is one way to move the type 1 and 2 data on to the same row:
(by placing the type 1 and type 2 information into their own selects that then get used in the from clause.)

SELECT
  a.name, ud1.*, ud2.*
FROM
    userData a,
    (SELECT user_id, orderid, orderType, reateDate, <etc>,
    FROM orderData b
    WHERE (userid, orderType, createDate) IN (
      SELECT userid, orderType, max(createDate) 
      FROM orderData 
      WHERE orderType = 1
      GROUP BY userid, orderType) ud1,
    (SELECT user_id, orderid, orderType, createDate, <etc>,
    FROM orderData 
    WHERE (userid, orderType, createDate) IN (
      SELECT userid, orderType, max(createDate) 
      FROM orderData 
      WHERE orderType = 2
      GROUP BY userid, orderType) ud2
hamishmcn
Started writing the above before I saw Steve K's solution, which is much better
hamishmcn
A: 

Here's how I do it. This is standard SQL and works in any brand of database.

SELECT u.userId, u.name, o1.orderId, o1.orderType, o1.createDate,
  o2.orderId, o2.orderType, o2.createDate
FROM userData AS u
  LEFT OUTER JOIN (
    SELECT o1a.orderId, o1a.userId, o1a.orderType, o1a.createDate
    FROM orderData AS o1a 
      LEFT OUTER JOIN orderData AS o1b ON (o1a.userId = o1b.userId 
        AND o1a.orderType = o1b.orderType AND o1a.createDate < o1b.createDate)
    WHERE o1a.orderType = 1 AND o1b.orderId IS NULL) AS o1 ON (u.userId = o1.userId)
  LEFT OUTER JOIN (
    SELECT o2a.orderId, o2a.userId, o2a.orderType, o2a.createDate
    FROM orderData AS o2a 
      LEFT OUTER JOIN orderData AS o2b ON (o2a.userId = o2b.userId 
        AND o2a.orderType = o2b.orderType AND o2a.createDate < o2b.createDate)
    WHERE o2a.orderType = 2 AND o2b.orderId IS NULL) o2 ON (u.userId = o2.userId);

Note that if you have multiple orders of either type whose dates are equal to the latest date, you'll get multiple rows in the result set. If you have multiple orders of both types, you'll get N x M rows in the result set. So I would recommend that you fetch the rows of each type in separate queries.

Bill Karwin
A: 

Steve K is absolutely right, thanks! I did rewrite his answer a little to account for the fact that there might be no order for a particular type (which I failed to mention, so I can't fault Steve K.)

Here's what I wound up using:

select ud.name,
       order1.orderId,
       order1.orderType,
       order1.createDate,
       order2.orderId,
       order2.orderType,
       order2.createDate
  from userData ud
  left join orderData order1
   on order1.orderId = (select max(orderId)
                            from orderData od1
                           where od1.userId = ud.userId
                             and od1.orderType = '1')
  left join orderData order2
   on order2.orderId = (select max(orderId)
                            from orderData od2
                           where od2.userId = ud.userId
                             and od2.orderType = '2')
 where ...[some limiting factors on the selection of users]...;
JavadocMD
+3  A: 

I have provided three different approaches for solving this problem:

  1. Using Pivots
  2. Using Case Statements
  3. Using inline queries in the where clause

All of the solutions assume we are determining the "most recent" order based on the orderId column. Using the createDate column would add complexity due to timestamp collisions and seriously hinder performance since createDate is probably not part of the indexed key. I have only tested these queries using MS SQL Server 2005, so I have no idea if they will work on your server.

Solutions (1) and (2) perform almost identically. In fact, they both result in the same number of reads from the database.

Solution (3) is not the preferred approach when working with large data sets. It consistently makes hundreds of logical reads more than (1) and (2). When filtering for one specific user, approach (3) is comparable to the other methods. In the single user case, a drop in the cpu time helps to counter the significantly higher number of reads; however, as the disk drive becomes busier and cache misses occur, this slight advantage will disappear.

Conclusion

For the presented scenario, use the pivot approach if it is supported by your DBMS. It requires less code than the case statement and simplifies adding order types in the future.

Please note, in some cases, PIVOT is not flexible enough and characteristic value functions using case statements are the way to go.

Code

Approach (1) using PIVOT:

select 
    ud.userId, ud.fullname, 
    od1.orderId as orderId1, od1.createDate as createDate1, od1.orderType as orderType1,
    od2.orderId as orderId2, od2.createDate as createDate2, od2.orderType as orderType2

from userData ud
    inner join (
      select userId, [1] as typeOne, [2] as typeTwo
      from (select
       userId, orderType, orderId
      from orderData) as orders
      PIVOT
      (
       max(orderId)
       FOR orderType in ([1], [2])
      ) as LatestOrders) as LatestOrders on
     LatestOrders.userId = ud.userId 
    inner join orderData od1 on
     od1.orderId = LatestOrders.typeOne
    inner join orderData od2 on
     od2.orderId = LatestOrders.typeTwo

Approach (2) using Case Statements:

select 
    ud.userId, ud.fullname, 
    od1.orderId as orderId1, od1.createDate as createDate1, od1.orderType as orderType1,
    od2.orderId as orderId2, od2.createDate as createDate2, od2.orderType as orderType2

from userData ud 
    -- assuming not all users will have orders use outer join
    inner join (
      select 
       od.userId,
       -- can be null if no orders for type
       max (case when orderType = 1 
         then ORDERID
         else null
         end) as maxTypeOneOrderId,

       -- can be null if no orders for type
       max (case when orderType = 2
         then ORDERID 
         else null
         end) as maxTypeTwoOrderId
      from orderData od
      group by userId) as maxOrderKeys on
     maxOrderKeys.userId = ud.userId
    inner join orderData od1 on
     od1.ORDERID = maxTypeTwoOrderId
    inner join orderData od2 on
     OD2.ORDERID = maxTypeTwoOrderId

Approach (3) using inline queries in the where clause (based on Steve K.'s response):

select  ud.userId,ud.fullname, 
     order1.orderId, order1.orderType, order1.createDate, 
     order2.orderId, order2.orderType, order2.createDate
  from userData ud,
       orderData order1,
       orderData order2
 where ud.userId = order1.userId
   and ud.userId = order2.userId
   and order1.orderId = (select max(orderId)
                            from orderData od1
                           where od1.userId = ud.userId
                             and od1.orderType = 1)
   and order2.orderId = (select max(orderId)
                             from orderData od2
                            where od2.userId = ud.userId
                              and od2.orderType = 2)

Script to generate tables and 1000 users with 100 orders each:

CREATE TABLE [dbo].[orderData](
    [orderId] [int] IDENTITY(1,1) NOT NULL,
    [createDate] [datetime] NOT NULL,
    [orderType] [tinyint] NOT NULL, 
    [userId] [int] NOT NULL
) 

CREATE TABLE [dbo].[userData](
    [userId] [int] IDENTITY(1,1) NOT NULL,
    [fullname] [nvarchar](50) NOT NULL
) 

-- Create 1000 users with 100 order each
declare @userId int
declare @usersAdded int
set @usersAdded = 0

while @usersAdded < 1000
begin
    insert into userData (fullname) values ('Mario' + ltrim(str(@usersAdded)))
    set @userId = @@identity

    declare @orderSetsAdded int
    set @orderSetsAdded = 0
    while @orderSetsAdded < 10
    begin
     insert into orderData (userId, createDate, orderType) 
      values ( @userId, '01-06-08', 1)
     insert into orderData (userId, createDate, orderType) 
      values ( @userId, '01-02-08', 1)
     insert into orderData (userId, createDate, orderType) 
      values ( @userId, '01-08-08', 1)
     insert into orderData (userId, createDate, orderType) 
      values ( @userId, '01-09-08', 1)
     insert into orderData (userId, createDate, orderType) 
      values ( @userId, '01-01-08', 1)
     insert into orderData (userId, createDate, orderType) 
      values ( @userId, '01-06-06', 2)
     insert into orderData (userId, createDate, orderType) 
      values ( @userId, '01-02-02', 2)
     insert into orderData (userId, createDate, orderType) 
      values ( @userId, '01-08-09', 2)
     insert into orderData (userId, createDate, orderType) 
      values ( @userId, '01-09-01', 2)
     insert into orderData (userId, createDate, orderType) 
      values ( @userId, '01-01-04', 2)

     set @orderSetsAdded = @orderSetsAdded + 1
    end
    set @usersAdded = @usersAdded + 1
end

Small snippet for testing query performance on MS SQL Server in addition to SQL Profiler:

-- Uncomment these to clear some caches
--DBCC DROPCLEANBUFFERS
--DBCC FREEPROCCACHE

set statistics io on
set statistics time on

-- INSERT TEST QUERY HERE

set statistics time off
set statistics io off
Mario
Wow, I definitely appreciate the length and detail of your answer.
JavadocMD