views:

1649

answers:

2

I'm helping maintain a program that's essentially a friendly read-only front-end for a big and complicated MySQL database -- the program builds ad-hoc SELECT queries from users' input, sends the queries to the DB, gets the results, post-processes them, and displays them nicely back to the user.

I'd like to add some form of reasonable/heuristic prediction for the constructed query's expected performance -- sometimes users inadvertently make queries that are inevitably going to take a very long time (because they'll return huge result sets, or because they're "going against the grain" of the way the DB is indexed) and I'd like to be able to display to the user some "somewhat reliable" information/guess about how long the query is going to take. It doesn't have to be perfect, as long as it doesn't get so badly and frequently out of whack with reality as to cause a "cry wolf" effect where users learn to disregard it;-) Based on this info, a user might decide to go get a coffee (if the estimate is 5-10 minutes), go for lunch (if it's 30-60 minutes), kill the query and try something else instead (maybe tighter limits on the info they're requesting), etc, etc.

I'm not very familiar with MySQL's EXPLAIN statement -- I see a lot of information around on how to use it to optimize a query or a DB's schema, indexing, etc, but not much on how to use it for my more limited purpose -- simply make a prediction, taking the DB as a given (of course if the predictions are reliable enough I may eventually switch to using them also to choose between alternate forms a query could take, but, that's for the future: for now, I'd be plenty happy just to show the performance guesstimates to the users for the above-mentioned purposes).

Any pointers...?

+12  A: 

EXPLAIN won't give you any indication of how long a query will take. At best you could use it to guess which of two queries might be faster, but unless one of them is obviously badly written then even that is going to be very hard.

You should also be aware that if you're using sub-queries, even running EXPLAIN can be slow (almost as slow as the query itself in some cases).

As far as I'm aware, MySQL doesn't provide any way to estimate the time a query will take to run. Could you log the time each query takes to run, then build an estimate based on the history of past similar queries?

Greg
We don't generate sub-queries at this time so that bit shouldn't be a problem. But thanks anyway for the pointer -- and the news that there's no good way to estimate the cost of a query (bad news, but, better to learn before I spend unbounded time chasing a chimera!).
Alex Martelli
EXPLAIN is extremely helpful. I'm not sure why this is the 'answer'. Checkout out the cardinality - the higher the number of rows, the more searching has to be done. Also, it shows which, if any, indexes are being used. This is critical for SELECT performance.As for subqueries, it's very rare that they're actually needed - they should be refactored out when possible for clarity's sake.
Adam Nelson
+5  A: 

I think if you want to have a chance of building something reasonably reliable out of this, what you should do is build a statistical model out of tables size and broken-down EXPLAIN result components correlated with query processing times. Trying to build a query execution time predictor based on thinking about the contents of an EXPLAIN is just going to spend way too long giving embarrassingly poor results before it gets refined to vague usefulness.

chaos