views:

60

answers:

3

I have this query below. There are 4 main tables involved: tblOrder, tblItems, tblOrder_archive, tblItem_archive. Orders and Items get moved over to the archived versions of the tables after a few months as not to slow down the main table queries. (sales and traffic is REALLY HIGH). So to get sales figures, i select what i need from each set of tables (archive and non archive).. union them.. do a group by on the union.. then do some math on the result.

Problem is that with any significant amount of rows (the order time span).. it will take so long for the query to run that it times out. I have added all the keys I can think of and still running super slow.

Is there more I can do to make this run faster? Can i write it differently? Can i use different indexes?

or should i write a script that gets the data from each table set first then does the math in the php script to combine them?

Thanks for the help.

SELECT
  description_invoice
, supplier
, type
, sum(quantity) AS num_sold
, sum(quantity*wholesale) AS wholesale_price
, sum(quantity*price) AS retail_price
, sum(quantity*price) - sum(quantity*wholesale) AS profit
FROM (
    SELECT
      tblOrder.*
    , tblItem.description_invoice
    , tblItem.type
    , tblItem.product_number
    , tblItem.quantity
    , tblItem.wholesale
    , tblItem.price
    , tblItem.supplier 
    FROM tblOrder USE KEY (finalized), tblItem
    WHERE
      tblItem.order_id = tblOrder.order_id 
     AND
      finalized=1
     AND
      wholesale <> 0
     AND (order_time >= 1251788400 AND order_time <= 1283669999) 

  UNION 

    SELECT
      tblOrder_archive.*
    , tblItem_archive.description_invoice
    , tblItem_archive.type
    , tblItem_archive.product_number
    , tblItem_archive.quantity
    , tblItem_archive.wholesale
    , tblItem_archive.price
    , tblItem_archive.supplier
    FROM tblOrder_archive USE KEY (finalized), tblItem_archive
    WHERE
      tblItem_archive.order_id=tblOrder_archive.order_id
     AND
      finalized=1
     AND
      wholesale <> 0
     AND (order_time >= 1251788400 AND order_time <= 1283669999)
) AS main_table
GROUP BY
  description_invoice
, supplier,type 
ORDER BY profit DESC;
+2  A: 
  • Create indexes on the columns you are using in the WHERE clauses.
  • Remove the index hint: USE KEY (finalized). If it does anything at all it will probably just make it slower by causing MySQL to choose this key instead of a potentially better key.
  • Add a LIMIT to avoid fetching too many rows. Use paging if you want to see more rows.
  • Use UNION ALL instead of UNION. This will be faster because it doesn't check for duplicates and also you probably don't want to remove duplicates here anyway since this will affect the total.

Orders and Items get moved over to the archived versions of the tables after a few months as not to slow down the main table queries.

This is probably a bad idea. Instead you should index your data correctly so that the queries don't become significantly slower when you add more data. Or alternatively you could look at partitioning the table.

Mark Byers
And check out EXPLAIN. It's a good way to see how the query is actually being processed behind the scenes. It can also give you an idea on where it can be improved.
CaseySoftware
+1  A: 

I re-wrote your query as:

   SELECT COALESCE(x.description_invoice, y.description_invoice) AS description_invoice,
          COALESCE(x.supplier, y.supplier) AS supplier,
          COALESCE(x.type, y.type) AS type,
          COALESCE(SUM(x.quantity), 0) + COALESCE(SUM(y.quantity), 0) as num_sold,
          COALESCE(SUM(x.quantity * x.wholesale), 0) + COALESCE(SUM(y.quantity * y.wholesale), 0) AS wholesale_price, 
          COALESCE(SUM(x.quantity * x.price), 0) + COALESCE(SUM(y.quantity * y.price), 0) AS retail_price,
          COALESCE(SUM(x.quantity * x.price), 0) - COALESCE(SUM(x.quantity * x.wholesale), 0) + COALESCE(SUM(y.quantity * y.price), 0) - COALESCE(SUM(y.quantity * y.wholesale), 0) as profit   
     FROM (SELECT o.order_id
             FROM TBLORDER o
            WHERE o.finalized = 1
              AND o.order_time BETWEEN 1251788400 
                                   AND 1283669999
           UNION ALL
           SELECT oa.order_id
            FROM TBLORDER_ARCHIVE oa
           WHERE oa.finalized = 1
             AND oa.order_time BETWEEN 1251788400 
                                   AND 1283669999) a
LEFT JOIN TBLITEM x ON x.order_id = a.order_id
                   AND x.wholesale != 0
LEFT JOIN TBLITEM_ARCHIVE y ON y.order_id = a.order_id
                           AND y.wholesale != 0
 GROUP BY description_invoice, supplier, type 
 ORDER BY profit DESC
  • Your query had UNION, but I'd expect not to need duplicate removal from an archive table so I changed it to UNION ALL - which is faster, because it doesn't remove duplicates
  • For what you provided, you had SELECT ORDERS.* and SELECT ORDER_ARCHIVE.* but never used any of the columns.
  • The aggregation functions (SUM) were all on the TBLITEM table, which was unnecessarily within the derived table/inline view.
  • I omitted the USE KEY(finalized); you can re-add it if you like but I'd compare with and with out it - I'd suggest running ANALYZE TABLE occaissionally on both tables prior to running the query so the optimizer has relatively fresh statistics.
  • I don't see much value in an index on the finalized column, but I don't know your data or use - just this query. But based on this query, I'd index:

    1. order_id
    2. order_time
    3. finalized

    ...as a covering index--a single index with three columns, in the order provided because order is important in a covering index.

OMG Ponies
That doesn't take into account the tblItem_archive table. There must be a join between it and the tblOrder_archive.. with the aggregration functions on both queries before the join.
@danjfoley: See update
OMG Ponies
A: 

I rewrote it as follows based on your help, and added the recommended covering index to both tblOrder and tblOrder archive and things seem to be much faster. But still i'm wondering if there something more to the way you wrote it.. but i would need to use tblItem_archive joined to tblOrder_archive as well.

SELECT
  description_invoice
, supplier
, type
, sum(quantity) AS num_sold
, sum(quantity*wholesale) AS wholesale_price
, sum(quantity*price) AS retail_price
, sum(quantity*price) - sum(quantity*wholesale) AS profit
FROM (
    SELECT
      tblOrder.order_id
    , tblItem.description_invoice
    , tblItem.type
    , tblItem.product_number
    , tblItem.quantity
    , tblItem.wholesale
    , tblItem.price
    , tblItem.supplier 
    FROM tblOrder, tblItem
    WHERE
      tblItem.order_id = tblOrder.order_id 
     AND
      finalized=1
     AND
      wholesale <> 0
     AND (order_time >= 1251788400 AND order_time <= 1283669999) 

   UNION ALL

    SELECT
      tblOrder_archive.order_id
    , tblItem_archive.description_invoice
    , tblItem_archive.type
    , tblItem_archive.product_number
    , tblItem_archive.quantity
    , tblItem_archive.wholesale
    , tblItem_archive.price
    , tblItem_archive.supplier
    FROM tblOrder_archive, tblItem_archive
    WHERE
      tblItem_archive.order_id=tblOrder_archive.order_id
     AND
      finalized=1
     AND
      wholesale <> 0
     AND (order_time >= 1251788400 AND order_time <= 1283669999)
) AS main_table
GROUP BY
  description_invoice
, supplier,type 
ORDER BY profit DESC;