tags:

views:

122

answers:

4

I am able to complete this query but it takes 25 seconds. That's too long! How can I optimize this query?

SELECT COUNT(DISTINCT u1.User_ID ) AS total
FROM UserClicks u1
INNER JOIN (SELECT DISTINCT User_ID 
              FROM UserClicks 
             WHERE (Date BETWEEN DATE_SUB(:startDate, INTERVAL 1 MONTH) AND :startDate)) u2
            ON u1.User_ID = u2.User_ID
WHERE (u1.Date BETWEEN :startDate AND :endDate)

This is being used on a MySQL database

A: 

Have you tried to move the DATE_SUB(:startDate, INTERVAL 1 MONTH) outside of the statement into a variable? Do you have an index by UserClicks.Date?

Jose Chama
A: 

Why not just use one select statement instead of running a nested pair of selects. Right now you're essentially running two queries. Try this:

SELECT COUNT(DISTINCT UserClicks.User_ID) AS total
FROM UserClicks
WHERE (UserClicks.Date BETWEEN :startDate AND :endDate)
AND (UserClicks.Date BETWEEN DATE_SUB(:startDate, INTERVAL 1 MONTH) AND :startDate)

Might help if you add an index on the date column too:

ALTER TABLE  `UserClicks` ADD INDEX (  `Date` );
Parrots
This will return not what the original query returns.
Quassnoi
what do you mean by add an index? can you show that too?
Andrew
@Quassnoi How are the queries going to differ, result-wise? I'm having a hard time seeing the difference. The nested ones are basically saying "get all the people between start and end date" "now from that get all the people between start date and +1 month". How is that different from just and AND operation?
Parrots
The askers query returns count of users that clicked *both* in a month before the `start_date` and between `start_date` and `end_date`. Your query returns number of users that clicked *exactly* on `start_date`.
Quassnoi
`@Parrots` In other words, the asker's query does a join and your query does an intersect. These are different operations.
Quassnoi
Good catch, +1 your answer.
Parrots
+2  A: 
SELECT  COUNT(*) AS total
FROM    (
        SELECT  DISTINCT User_ID 
        FROM    UserClicks 
        WHERE   Date BETWEEN DATE_SUB(:startDate, INTERVAL 1 MONTH) AND :startDate
        ) u1
WHERE   EXISTS
        (
        SELECT  NULL
        FROM    UserClicks u2
        WHERE   u2.User_ID = u1.User_ID
                AND u2.Date BETWEEN :startDate AND :endDate
        )

Create a composite index on (User_ID, Date):

CREATE INDEX ix_userclicks_user_date ON UserClicks (User_ID, Date)

If you have few users but lots of clicks, and have a table Users, you may use the Users table instead of DISTINCT:

SELECT  COUNT(*)
FROM    Users u
WHERE   EXISTS
        (
        SELECT  NULL
        FROM    UserClicks uc1
        WHERE   uc1.UserId = u.Id
                AND uc1.Date BETWEEN DATE_SUB(:startDate, INTERVAL 1 MONTH) AND :startDate
        )
        AND EXISTS
        (
        SELECT  NULL
        FROM    UserClicks uc2
        WHERE   uc2.UserId = u.Id
                AND u2.Date BETWEEN :startDate AND :endDate
        )
Quassnoi
what do I change after creating the composite index?
Andrew
also...does the composite index need to be unique? (sorry if it's a stupid question)
Andrew
Composite index will help the query to run faster (especially the second query)
Quassnoi
No, it does not have to be unique. However, if it is intrinsically `UNIQUE` (that is, you cannot have two clicks from one user at the same time), you can make it `UNIQUE`.
Quassnoi
So just by creating the index, the query runs faster?
Andrew
`@Andrew`: yes. The second query will be the fastest.
Quassnoi
A: 

MySQL tends to ignore indexes when processing subqueries, so it has to process every row. How about a self-join instead? This is just off the top of my head so it may not be quite correct, but it should at least point you in the right direction.

SELECT COUNT(DISTINCT u1.User_ID) AS total
FROM   UserClicks AS u1
JOIN   UserClicks AS u2 USING (User_ID)
WHERE  u1.Date BETWEEN :startDate AND :endDate
AND    u2.Date BETWEEN DATE_SUB(:startDate, INTERVAL 1 MONTH) AND :startDate)
Duncan