I'm trying to find some suggestions in optimizing a query that I'm using to fetch a large group of data.
The original code that I'm working on looped through a large set of users, and calculated a date range for each one of them. It would then take that date range and query how many questions they answered and how many they got correct within that date range. Those results were tallied up, and it is those end tallies that we need.
What I've already done to speed this up (because it was taking several minutes) is this: Rather than querying each user individually, the script now just loops through each user calculates the date range that would be applicable to them (all other aspects of the query are identical for each user). That data is collected in a 3d array [startDate][endDate][userid], and a single query built to do that operaion on all of the users. Here's an example of the query that gets output:
SELECT COUNT(uapl.id) AS numAnswered,
SUM(CASE WHEN (a.correct OR q.survey OR uapl.answersId IS NULL) THEN 1 ELSE 0 END) AS numCorrect
FROM usersAnswersProgramsLink uapl
JOIN questions q ON uapl.questionsId=q.id
LEFT JOIN answers a ON uapl.answersId=a.id
WHERE
programsId=123
AND
(
(
CAST(timestamp AS date) >= '2009-09-01'
AND CAST(timestamp AS date) <= '2009-09-21'
AND usercontextid in('123','234','345','465','567')
)
OR
(
CAST(timestamp AS date) >= '2009-09-10'
AND CAST(timestamp AS date) <= '2009-09-21'
AND usercontextid in('321','432','543')
)
OR
(
CAST(timestamp AS date) >= '2009-09-16'
AND CAST(timestamp AS date) <= '2009-09-21'
AND usercontextid in('987','876')
)
)
This works relatively well in speeding up the code. With most of the tests I run on this, it now takes between 20% and 10% as long. With my worst case though, it's only 50%, and I'd like to improve that.
That worst case happens when I have a huge number of user id's to compare (ten-thousandish). The problem now is that there is no more optimizing to be done on the algorithm that I pulled those queries out of. It now goes by in milliseconds. It's this query that takes a long time.
So that's my conundrum. I'd like to speed it up even more. Any suggestions would be welcome. A couple of pieces of info that are pertinent here:
1) There is a 1 to many relationship between the date ranges and the users. none of those user id's will show up in multiple date ranges. 2) The end result we're looking for is just those tallies, but the date ranges need to be calculated on a per-user basis, hence the array of id's per date range.
One thing that I thought ~might~ make it faster would be to crate a temporary table with a column for the date range and a column for the user id's. Then rewrite that query using a JOIN to that table rather than putting those numbers in the query itself. Does anyone know if that would work?
Thanks for any suggestions!