I've got a table of keywords that I regularly refresh against a remote search API, and I have another table that gets a row each each time I refresh one of the keywords. I use this table to block multiple processes from stepping on each other and refreshing the same keyword, as well as stat collection. So when I spin up my program, it queries for all the keywords that don't have a request currently in process, and don't have a successful one within the last 15 mins, or whatever the interval is. All was working fine for awhile, but now the keywords_requests table has almost 2 million rows in it and things are bogging down badly. I've got indexes on almost every column in the keywords_requests table, but to no avail.
I'm logging slow queries and this one is taking forever, as you can see. What can I do?
# Query_time: 20 Lock_time: 0 Rows_sent: 568 Rows_examined: 1826718 SELECT Keyword.id, Keyword.keyword FROM `keywords` as Keyword LEFT JOIN `keywords_requests` as KeywordsRequest ON ( KeywordsRequest.keyword_id = Keyword.id AND (KeywordsRequest.status = 'success' OR KeywordsRequest.status = 'active') AND KeywordsRequest.source_id = '29' AND KeywordsRequest.created > FROM_UNIXTIME(1234551323) ) WHERE KeywordsRequest.id IS NULL GROUP BY Keyword.id ORDER BY KeywordsRequest.created ASC;