tags:

views:

97

answers:

1

Hi,

I am getting and calculating some basic order information in my SQL query. I have it working as it should but have been reading about the GROUP BY SQL Clause. I am wondering if the following SQL statement would benefit from GROUP BY and if it would be more efficient to use it? Thanks!

SELECT orders.billerID, 
orders.invoiceDate, 
orders.txnID, 
orders.bName, 
orders.bStreet1, 
orders.bStreet2, 
orders.bCity, 
orders.bState, 
orders.bZip, 
orders.bCountry, 
orders.sName, 
orders.sStreet1, 
orders.sStreet2, 
orders.sCity, 
orders.sState, 
orders.sZip, 
orders.sCountry, 
orders.paymentType, 
orders.invoiceNotes, 
orders.pFee, 
orders.shipping, 
orders.tax, 
orders.reasonCode, 
orders.txnType, 
orders.customerID, 
customers.firstName AS firstName, 
customers.lastName AS lastName, 
customers.businessName AS businessName, 
orderStatus.statusName AS orderStatus, 
(IFNULL(SUM((orderItems.itemPrice * orderItems.itemQuantity)), 0.00) + orders.shipping + orders.tax) AS orderTotal, 
((IFNULL(SUM((orderItems.itemPrice * orderItems.itemQuantity)), 0.00) + orders.shipping + orders.tax) - (SELECT IFNULL(SUM(payments.amount), 0.00) FROM payments WHERE payments.orderID = orders.id)) AS orderBalance 
FROM orders 
LEFT JOIN customers ON orders.customerID = customers.id 
LEFT JOIN orderStatus ON orders.orderStatus = orderStatus.id
LEFT JOIN orderItems ON orderItems.orderID = orders.id 
LEFT JOIN payments ON payments.orderID = orders.id
+1  A: 

GROUP BY would probably allow the SQL engine to better optimize your query but would make it harder to read due to the large number of grouping parameters.

Another option as recommended by the SQL Team is to consider using Sub queries. This can often make the GROUP BY statements much simpler and makes the overall query much easier to read.

Using a Sub query:

SELECT orders.billerID, 
    orders.invoiceDate, 
    orders.txnID, 
    orders.bName, 
    orders.bStreet1, 
    orders.bStreet2, 
    orders.bCity, 
    orders.bState, 
    orders.bZip, 
    orders.bCountry, 
    orders.sName, 
    orders.sStreet1, 
    orders.sStreet2, 
    orders.sCity, 
    orders.sState, 
    orders.sZip, 
    orders.sCountry, 
    orders.paymentType, 
    orders.invoiceNotes, 
    orders.pFee, 
    orders.shipping, 
    orders.tax, 
    orders.reasonCode, 
    orders.txnType, 
    orders.customerID, 
    customers.firstName AS firstName, 
    customers.lastName AS lastName, 
    customers.businessName AS businessName, 
    orderStatus.statusName AS orderStatus, 
    orderItem.fees + orders.shipping + orders.tax AS orderTotal, 
    orderItem.fees + orders.shipping + orders.tax - payments.amount AS orderBalance 
FROM orders 
LEFT JOIN customers ON orders.customerID = customers.id 
LEFT JOIN orderStatus ON orders.orderStatus = orderStatus.id
LEFT JOIN 
    ( 
      SELECT orderID, SUM(itemPrice * itemQuantity) as fees
      FROM orderItems
      GROUP BY orderID
    ) orderItems ON orderItems.orderID = orders.id 
LEFT JOIN 
    ( 
      SELECT orderID, SUM(amount) as amount
      FROM payments
      GROUP BY orderID
    ) payments ON payments.orderID = orders.id

Using a GROUP BY:

SELECT orders.billerID, 
    orders.invoiceDate, 
    orders.txnID, 
    orders.bName, 
    orders.bStreet1, 
    orders.bStreet2, 
    orders.bCity, 
    orders.bState, 
    orders.bZip, 
    orders.bCountry, 
    orders.sName, 
    orders.sStreet1, 
    orders.sStreet2, 
    orders.sCity, 
    orders.sState, 
    orders.sZip, 
    orders.sCountry, 
    orders.paymentType, 
    orders.invoiceNotes, 
    orders.pFee, 
    orders.shipping, 
    orders.tax, 
    orders.reasonCode, 
    orders.txnType, 
    orders.customerID, 
    customers.firstName AS firstName, 
    customers.lastName AS lastName, 
    customers.businessName AS businessName, 
    orderStatus.statusName AS orderStatus, 
    SUM(orderItems.itemPrice * orderItems.itemQuantity) + orders.shipping + orders.tax AS orderTotal, 
    SUM(orderItems.itemPrice * orderItems.itemQuantity) + orders.shipping + orders.tax - SUM(payments.amount) AS orderBalance 
FROM orders 
LEFT JOIN customers ON orders.customerID = customers.id 
LEFT JOIN orderStatus ON orders.orderStatus = orderStatus.id
LEFT JOIN orderItems ON orderItems.orderID = orders.id 
LEFT JOIN payments ON payments.orderID = orders.id
GROUP BY 
    orders.billerID, 
    orders.invoiceDate, 
    orders.txnID, 
    orders.bName, 
    orders.bStreet1, 
    orders.bStreet2, 
    orders.bCity, 
    orders.bState, 
    orders.bZip, 
    orders.bCountry, 
    orders.sName, 
    orders.sStreet1, 
    orders.sStreet2, 
    orders.sCity, 
    orders.sState, 
    orders.sZip, 
    orders.sCountry, 
    orders.paymentType, 
    orders.invoiceNotes, 
    orders.pFee, 
    orders.shipping, 
    orders.tax, 
    orders.reasonCode, 
    orders.txnType, 
    orders.customerID, 
    customers.firstName, 
    customers.lastName, 
    customers.businessName, 
    orderStatus.statusName  

GROUP BY Explained:

You can thing of GROUP BY as collecting records together that have similar data. For my example I am going to use a simple produce table with Category, Name and Price columns. If I group the data by Category I can aggregate ( i.e. SUM, COUNT, MIN, MAX, etc.) based on any of the other columns. Since I am grouping by the Category column the resulting records will have a unique value for Category. Any of the other columns might be return different value and therefore cannot be included in the select statement.

Name, Category, Price
Green Peppers, Peppers, 1.50
Orange Peppers, Peppers, 2.50
Yellow Peppers, Peppers, 2.50
Lemons, Citrus, 1.00
Oranges, Citrus, 1.00
Limes, Citrus, 1.00

SELECT 
    Category, /* This is unique because it is in the GROUP BY clause */
    AVG(Price) AS AveragePrice,
    MAX(Price) AS MaxPrice,
    MIN(Price) AS MinPrice
    /* , Name */  /* This is invalid because it is not in the GROUP BY clause */
                  /* The values are not unique so SQL does not know what to return */
FROM Produce
GROUP BY Category
smaclell
I have edited the query with your advice. Is this what you mean?
Jesse Bunch
No I meant something like this`SELECT * FROM ( SELECT * FROM orders) x`Where sub query x can almost any select statement. After thinking about it I decided to leave out the Sub Select example because it normally not the correct way to do things.
smaclell
Ok. So I don't understand why we need such a large list of Group By parameters? This group by thing has my head spinning...Thanks for any explanation.
Jesse Bunch
Okay! I am having trouble fitting in an example into a comment but I will give it a shot. When you do a `GROUP BY` if you want to return a single column you need to add it to the list of grouping parameters. Since you are grouping records together there might be multiple values for a single column.
smaclell
@smaciell - Would you disagree with this blog post? http://weblogs.sqlteam.com/jeffs/archive/2005/12/14/8546.aspx
Jesse Bunch
Actually, I very much agree with the post and was originally going to show something like that as the answer. I will add the sub select version as it is alot cleaner. During my last project some of the team members took this too far with queries having too many nested sub select statements which just made things harder to read.
smaclell
I have added the sub query version and a simple example of how the group by works. Thanks for the question, I had fun.
smaclell
Thanks for taking the time to explain and provide examples.
Jesse Bunch
OMG Ponies