views:

144

answers:

6

I am trying to setup a script that would test performance of queries on a development mysql server. Here are more details:

  • I have root access
  • I am the only user accessing the server
  • Mostly interested in InnoDB performance
  • The queries I am optimizing are mostly search queries (SELECT ... LIKE '%xy%')

What I want to do is to create reliable testing environment for measuring the speed of a single query, free from dependencies on other variables.

Till now I have been using SQL_NO_CACHE, but sometimes the results of such tests also show caching behaviour - taking much longer to execute on the first run and taking less time on subsequent runs.

If someone can explain this behaviour in full detail I might stick to using SQL_NO_CACHE; I do believe that it might be due to file system cache and/or caching of indexes used to execute the query, as this post explains. It is not clear to me when Buffer Pool and Key Buffer get invalidated or how they might interfere with testing.

So, short of restarting mysql server, how would you recommend to setup an environment that would be reliable in determining if one query performs better then the other?

A: 

You could try the mysql workbench, i thought it had a sql statement monitor so you can see how fast it is and why it is fast

Spidfire
A: 

As the linked article suggests, use FLUSH TABLES between test runs to reset as much as you can (notably the query cache).

Shouldn't your testing take into account that InnoDB will itself have different states during actual performance, such that you become interested in aggregate performance over multiple trials? How "real" is your performance testing going to be if you want to reset InnoDB for every trial? The query you reject because it performs poorly immediately after restart might be far and away the best query after InnoDB has warmed up a little bit.

If I were you, I'd focus on what the query optimizer is doing separately from InnoDB's performance. There's much written about how to tune InnoDB, but it helps to have good queries to start.

You could also try measuring performance with equivalent MyISAM tables, where FLUSH TABLES really will reset you to a mostly-identical starting point.

Have you tried turning query caching off altogether? Even with SQL_NO_CACHE, there's about a 3% penalty just having the query cache on.

David M
A: 

Full text queries on InnoDB are slow(LIKE "%query%" statements) , there is nothing that you can do to optimize them. Solutions vary from passing that particular table you are querying to MyISAM so you can create fulltext indexes (which innoDB does not support), to denormalizing the row into searchable indexes (not recommended), Doctrine ORM provides an easy example of how to archieve this : http://www.doctrine-project.org/documentation/manual/1_1/nl/behaviors:core-behaviors:searchable The "proper" solution to your problem would be to index the information youre using full text searches on, with a solution like Sphinx Search or Apache Solr.

Like previously said, you must consider the cache state when comparing results, a primed cache gives extremely performant queries. You should consider the cache hit percentage of a particular query, even if it is an expensive query, if it has a 99% cache hit ratio, the average performance will be very high.

Finegrained tuning of queries is not a silver bullet, you might be adding complexity to your application for the sake of optimizations that overall in a production enviroment, are negligible.

Consider your workload, troubleshoot frequent , unperforming queries (use the slow_query_log in mysql, dont blindly start optimizing queries).

mhughes
All advice that agree in my current approach: slow query log is on, n-gram engines are considered, workload is taken into account and I am not disregarding the cache for production. Still, given two queries that give same rows as result, they would perform the same once they are cached, right? So all that is left is to compare how would they perform if they are not cached. And I would love to have a reliable approach that would answer that.
Unreason
Test it. Two concurrent queries which will return the same dataset, it is most probable that the second will take less time. And i mean considerably less time.What i'm aiming at, is that performance of particular queries is not directly proportional to application performance. You should be concerned not only with the cost of queries, but their frecuency. Frequent calling of an "expensive" query, which is properly cached, on average turns into a cheap query. Likewise an expensive query, which is called on average once a day, is a cheap query considering cost/frequency.
mhughes
A: 

Have you considered using Maatkit? One of its capabilities I'm slightly familiar with is to capture MySQL network data with tcpdump and process the dump with mk-query-digest. This tool allows you to show some fine grained details about each query. But there's a whole bunch of other tools which should make query analysis easier.

Bram Schoenmakers
Maatkit is on my list of tools to test. What are the others?
Unreason
Well, I was probably too sleepy when I wrote that. I referred to other commands/tools inside Maatkit for query analysis.
Bram Schoenmakers
+1  A: 

Assuming that you can not optimize the LIKE operation itself, you should try to optimize the base query without them minimizing number of rows that should be checked.

Some things that might be useful for that:

rows column in EXPLAIN SELECT ... result. Then,

mysql> set profiling=1;
mysql> select sql_no_cache * from mytable;
 ...
mysql> show profile;
+--------------------+----------+
| Status             | Duration |
+--------------------+----------+
| starting           | 0.000063 |
| Opening tables     | 0.000009 |
| System lock        | 0.000002 |
| Table lock         | 0.000005 |
| init               | 0.000012 |
| optimizing         | 0.000002 |
| statistics         | 0.000007 |
| preparing          | 0.000005 |
| executing          | 0.000001 |
| Sending data       | 0.001309 |
| end                | 0.000003 |
| query end          | 0.000001 |
| freeing items      | 0.000016 |
| logging slow query | 0.000001 |
| cleaning up        | 0.000001 |
+--------------------+----------+
15 rows in set (0.00 sec)

Then,

mysql> FLUSH STATUS;
mysql> select sql_no_cache * from mytable;
...
mysql> SHOW SESSION STATUS LIKE 'Select%';
+------------------------+-------+
| Variable_name          | Value |
+------------------------+-------+
| Select_full_join       | 0     |
| Select_full_range_join | 0     |
| Select_range           | 0     |
| Select_range_check     | 0     |
| Select_scan            | 1     |
+------------------------+-------+
5 rows in set (0.00 sec)

And another interesting value is last_query_cost, which shows how expensive the optimizer estimated the query (the value is the number of random page reads):

mysql> SHOW STATUS LIKE 'last_query_cost';
+-----------------+-------------+
| Variable_name   | Value       |
+-----------------+-------------+
| Last_query_cost | 2635.399000 |
+-----------------+-------------+
1 row in set (0.00 sec)

MySQL documentation is your friend.

newtover
A: 

Cited from this page: SQL_NO_CACHE options affect caching of query results in the query cache. If your table is quite small, it is possible, that the table itself is already cached. Since you just avoid caching of the results and not the tables you get the described behavior sometimes. So, as told in the other postings, you should flush your tables in between the queries.

ablaeul