views:

65

answers:

1

I need to use InnoDB storage engine on a table with about 1mil or so records in it at any given time. It has records being inserted to it at a very fast rate, which are then dropped within a few days, maybe a week. The ping table has about a million rows, whereas the website table only about 10,000.

My statement is this:

select url
from website ws, ping pi 
where ws.idproxy = pi.idproxy and pi.entrytime > curdate() - 3 and contentping+tcpping is not null 
group by url 
having sum(contentping+tcpping)/(count(*)-count(errortype)) < 500 and count(*) > 3 and 
count(errortype)/count(*) < .15
order by sum(contentping+tcpping)/(count(*)-count(errortype)) asc;

I added an index on entrytime, yet no dice. Can anyone throw me a bone as to what I should consider to look into for basic optimization of this query. The result set is only like 200 rows, so I'm not getting killed there.

+2  A: 

In the absence of the schemas of the relations, I'll have to make some guesses.

  • If you're making WHERE a.attrname = b.attrname clauses, that cries out for a JOIN instead.

  • Using COUNT(*) is both redundant and sometimes less efficient than COUNT(some_specific_attribute). The primary key is a good candidate.

  • Why would you test contentping+tcpping IS NOT NULL, asking for a calculation that appears unnecessary, instead of just testing whether the attributes individually are null?

Here's my attempt at an improvement:

SELECT url
FROM website AS ws
    JOIN ping AS pi
        ON ws.idproxy = pi.idproxy
WHERE
    pi.entrytime > CURDATE() - 3
    AND pi.contentping IS NOT NULL
    AND pi.tcpping IS NOT NULL
GROUP BY url
HAVING
    SUM(pi.contentping + pi.tcpping) / (COUNT(pi.idproxy) - COUNT(pi.errortype)) < 500
    AND COUNT(pi.idproxy) > 3
    AND COUNT(pi.errortype) / COUNT(pi.idproxy) < 0.15
ORDER BY
    SUM(pi.contentping + pi.tcpping) / (COUNT(pi.idproxy) - COUNT(pi.errortype)) ASC;

Performing lots of identical calculations in both the HAVING and ORDER BY clauses will likely be costing you performance. You could either put them in the SELECT clause, or create a view that has those calculations as attributes and use that view for accessing the values.

bignose
Thanks, I'll try these out individually when I get a chance and see how they work.
Zombies