views:

735

answers:

6

I'm preforming an aggregate function on multiple records, which are grouped by a common ID. The problem is, I also want to export some other fields, which might be different within the grouped records, but I want to get those certain fields from one of the records (the first one, according to the query's ORDER BY).

Starting point example:

SELECT
  customer_id,
  sum(order_total),
  referral_code
FROM order
GROUP BY customer_id
ORDER BY date_created

I need to query the referral code, but doing it outside of an aggregate function means I have to group by that field as well, and that's not what I want - I need exactly one row per customer in this example. I really only care about the referral code from the first order, and I'm happy to throw out any later referral codes.

This is in PostgreSQL, but maybe syntax from other DBs could be similar enough to work.

Rejected solutions:

  • Can't use max() or min() because order is significant.
  • A subquery might work at first, but does not scale; this is an extremely reduced example. My actual query has dozens of fields like referral_code which I only want the first instance of, and dozens of WHERE clauses which, if duplicated in a subquery, would make for a maintenance nightmare.
A: 

You will need window functions. It's kind of GROUP BY, but you can still access the individual rows. Only used the Oracle equivalent though.

gabor
Interesting... looks like a new feature for 8.4? Unfortunately it takes us a while to move to new versions once they're released, right now we're still stuck on 8.2 (though hopefully not for much longer...) :\
David
A: 

Perhaps something like:

SELECT
     O1.customer_id,
     O1.referral_code,
     SQ.total
FROM
     Orders O1
LEFT OUTER JOIN Orders O2 ON
     O2.customer_id = O1.customer_id AND
     O2.date_created < O1.date_created
INNER JOIN (
     SELECT
          customer_id,
          SUM(order_total) AS total
     FROM
          Orders
     GROUP BY
          customer_id
     ) SQ ON SQ.customer_id = O1.customer_id
WHERE
     O2.customer_id IS NULL
Tom H.
You need to add a "GROUP BY customer_id" to the end of your subquery. Then your query gives the last referral_code. Chagne the greater-than to a less-than for the join criteria and it will get the first referral_code.
Thanks, looks like I left the GROUP BY out in my cut-n-paste
Tom H.
A: 

If the date_created is guaranteed to be unique per customer_id, then you can do this:

[simple table]

create table ordertable (customer_id int, order_total int, referral_code char, date_created datetime)
insert ordertable values (1,10, 'a', '2009-01-01')
insert ordertable values (2,15, 'b', '2009-01-02')
insert ordertable values (1,35, 'c', '2009-01-03')

[replace my lame table names with something better :)]

SELECT
  orderAgg.customer_id,
  orderAgg.order_sum,
  referral.referral_code as first_referral_code
FROM (
        SELECT
          customer_id,
          sum(order_total) as order_sum
        FROM ordertable
        GROUP BY customer_id
    ) as orderAgg join (
        SELECT
          customer_id,
          min(date_created) as first_date
        FROM ordertable
        GROUP BY customer_id
    ) as dateAgg on orderAgg.customer_id = dateAgg.customer_id
    join ordertable as referral 
        on dateAgg.customer_id = referral.customer_id
            and dateAgg.first_date = referral.date_created
+1  A: 

Well, it's actually pretty simple.

First, let's write a query that will do the aggregation:

select customer_id, sum(order_total)
from order
group by customer_id

now, let's write a query that would return 1st referral_code and date_created for given customer_id:

select distinct on (customer_id) customer_id, date_created, referral_code
from order
order by customer_id, date_created

Now, you can simply join the 2 selects:

select
    x1.customer_id,
    x1.sum,
    x2.date_created,
    x2.referral_code
from
    (
        select customer_id, sum(order_total)
        from order
        group by customer_id
    ) as x1
    join
    (
        select distinct on (customer_id) customer_id, date_Created, referral_code
        from order
        order by customer_id, date_created
    ) as x2 using ( customer_id )
order by x2.date_created

I didn't test it, so there could be typos in it, but generally it should work.

depesz
+1, but this still suffers from requiring any additional WHERE clauses to be updated in 2 places.
j_random_hacker
Well, it can be done without this requirement, but it would require custom aggregate (first). Not that it's difficult.
depesz
A: 

Would something like this do the trick?

SELECT
  customer_id,
  sum(order_total),
  (SELECT referral_code 
   FROM order o 
   WHERE o.customer_id = order.customer_id 
   ORDER BY date_created 
   LIMIT 1) AS customers_referral_code
FROM order
GROUP BY customer_id, customers_referral_code
ORDER BY date_created

This doesn't require you to maintain the WHERE clause in two places and maintains the order significance, but would get pretty hairy if you needed "dozens of fields" like referral_code. It's also fairly slow (at least on MySQL).

It sounds to me like referral_code and the dozens of fields like it should be in the customer table, not the order table, since they're logically associated 1:1 with the customer, not the order. Moving them there would make the query MUCH simpler.

This might also do the trick:

SELECT
  o.customer_id,
  sum(o.order_total),
  c.referral_code, c.x, c.y, c.z
FROM order o LEFT JOIN (
    SELECT referral_code, x, y, z
    FROM orders c 
    WHERE c.customer_id = o.customer_id 
    ORDER BY c.date_created
    LIMIT 1
) AS c
GROUP BY o.customer_id, c.referral_code
ORDER BY o.date_created
John Douthat
Currently your query contains two fields called referral_code (one being the subquery), neither of which are listed in the GROUP BY.
j_random_hacker
The first referral_code was indeed an error. The lack of it in the GROUP BY was simply because some dialects of SQL don't require it. Thanks for pointing that out, fixed.
John Douthat
A: 
SELECT  customer_id, order_sum,
        (first_record).referral, (first_record).other_column
FROM    (
        SELECT  customer_id,
                SUM(order_total) AS order_sum,
                (
                SELECT  oi
                FROM    order oi
                WHERE   oi.customer_id = o.customer_id
                LIMIT 1
                ) AS first_record
        FROM    order o
        GROUP BY
                customer_id
        ) q
Quassnoi