views:

79

answers:

3

I have large MySQL tables with hundreds of thousands of rows.

I need to write a query on a customers table which gets the count of when customers will be available to contact again.

eg.

SELECT 'This week', COUNT(*) FROM customers 
WHERE sales_person_id = 1 AND DATEDIFF(NOW(), available_date) < 7

UNION

SELECT 'Next week', COUNT(*) FROM customers 
WHERE sales_person_id = 1 AND DATEDIFF(NOW(), available_date) >= 7 
    AND DATEDIFF(NOW(), available_date) < 14

UNION

... (a few more like this)

Having written a similar query on a different large table, I did notice that changing the engine from InnoDB to MyISAM sped up the query considerably (InnoDB is not needed for these tables as they do not have foreign key checks). Is there anything else I can do to speed up counts like this (other than indexing the appropriate fields)?

+5  A: 

WHERE sales_person_id = 1 AND available_date BETWEEN CURDATE() - INTERVAL 1 WEEK AND CURDATE()

Doing it this way should let MySQL use a composite index created on (sales_person_id, available_date) columns (use EXPLAIN to check)

Mchl
+1. The more general point is to avoid using function calls in your where clause (in this case DATEDIFF), as they force index scans rather than index seeks. If you think about this, it makes sense - because the result of the function isn't indexed, thus each row has to be checked.
womp
+1. Yes. The original query is not SARGable.
Joe Stefanelli
Thanks for the explanation womp. I see from your profile, you live in Victoria too!
Matt McCormick
@Matt McCormick, at the risk of being obvious - womp meant avoid using function calls on actual columns; using function calls on constants (from query's point of view), such as CURDATE(), which is called only once per query is OK.
Unreason
+1  A: 

Focus on the WHERE clause.

  • Is there index on fields in the WHERE clause?
  • Can you replace the datediff() function with a constant, it is being evaluated for every row.
Damir Sudarevic
+3  A: 
  1. Never do in multiple queries, what you can do in one.

    If you create a derived table/inline view that has the necessary start & end dates, this can be used to produce the desired result in a single query using a GROUP BY. MySQL doesn't have a recursive function, so you have to use the NUMBERS table trick to generate dates...

    1. Create a table that only holds incrementing numbers - easy to do using an auto_increment:

      DROP TABLE IF EXISTS `example`.`numbers`;
      CREATE TABLE  `example`.`numbers` (
       `id` int(10) unsigned NOT NULL auto_increment,
        PRIMARY KEY  (`id`)
      ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
      
    2. Populate the table using:

      INSERT INTO NUMBERS (id)
      VALUES (NULL)
      

      ...for as many values as you need.

    3. Use DATE_ADD to construct a list of dates, increasing the days based on the NUMBERS.id value.

      SELECT x.start_dt,
             x.end_dt
        FROM (SELECT DATE_ADD(NOW(), INTERVAL n.id - 1 DAY) AS start_dt,
                     DATE_ADD(NOW(), INTERVAL n.id + 6 DAY) AS end_dt
                FROM `numbers` n
               WHERE DATE_ADD(NOW(), INTERVAL (n.id - 1) DAY) <= '2011-01-01') x
      
    4. JOIN onto your table of data based on the datetime portion:

        SELECT x.start_dt,
               x.end_dt,
               COUNT(*) AS num
          FROM (SELECT DATE_ADD(NOW(), INTERVAL n.id - 1 DAY) AS start_dt,
                       DATE_ADD(NOW(), INTERVAL n.id + 6 DAY) AS end_dt
                  FROM `numbers` n
                 WHERE DATE_ADD(NOW(), INTERVAL (n.id - 1) DAY) <= '2011-01-01') x
          JOIN CUSTOMERS c ON c.available_date BETWEEN x.start_dt
                                                   AND x.end_dt
      GROUP BY x.start_dt, x.end_dt
      
  2. Don't use functions performed upon actual column data - IE: DATEDIFF(NOW(), *available_date*) - because a database can't use an index (if one exists) on the available_date column because the data has been altered away from the index value.

OMG Ponies
Thanks for the idea. I think this is overkill for this query at the moment but I will keep this technique in mind should it need to be optimized further in the future.
Matt McCormick