views:

232

answers:

5

Lately my queries before caching into memcache have been taking forever to process! In this example, it took 10 seconds. All I am trying to do is get the 10 most recent hits in this case.

I am getting the feeling that it loads all 125,592 rows then only returns 10, am I right?

# User@Host: root[root] @ localhost []
# Query_time: 10  Lock_time: 0  Rows_sent: 10  Rows_examined: 125592
SELECT * FROM hits WHERE campaign_id = 30 ORDER BY id DESC LIMIT 10;

Here is another slow query:

# Time: 090214  5:00:40
# User@Host: root[root] @ localhost []
# Query_time: 3  Lock_time: 0  Rows_sent: 1  Rows_examined: 128879
SELECT count(DISTINCT(ip_address)) AS count_distinct_ip_address FROM `hits` WHERE (campaign_id = 30);

When running the query the phpMyAdmin, it takes 1.3395 seconds. Although just doing a SELECT * FROM hits only takes 0.0001 seconds. I find it very odd that returning all of the hits takes less then sorting through them, or is it just that, I am sorting through them?

For those who want to see my table:

CREATE TABLE `hits` (
  `id` int(11) unsigned NOT NULL auto_increment,
  `hostname` varchar(255) NOT NULL,
  `url` tinytext NOT NULL,
  `user_agent` tinytext NOT NULL,
  `created_at` timestamp NOT NULL default CURRENT_TIMESTAMP,
  `ip_address` varchar(15) NOT NULL,
  `campaign_id` int(11) NOT NULL,
  PRIMARY KEY  (`id`),
  KEY `campaign_id` (`campaign_id`),
  KEY `ip_address` (`ip_address`)
);
+2  A: 

An index on (campaign_id,id) should take care of the first fairly well. But the distinct one is a bit trickier...

Edit: MySQL doesn't use multiple indexes for one query; so yes, you need one index that covers all the fields involved in the query.

womble
I already do though, see the primary key, and the key for campaign_id?
Garrett
Its the lack of an index also on ID that is the issue.WHERE campaign_id = 30 is covered by the index but:ORDER BY id DESCisn't, so server will have to load all rows matching campaign_id=3-, sort by id, then grap the first 10
Kristen
MySQL does indeed do multiple index combination in one query. For example, a WHERE clause on two columns, where there are 2 separate queries - it can use both indexes.
Chris KL
@Chris: In what way is "2 separate queries" performing "multiple index combination in one query"?
womble
+1  A: 

If the query is taking to long to process it is usually because of missing indexes, poor disk IO, or some other bottleneck. A table with 120 000 rows isn't a hell of a lot of data and the query really shouldn't take that long. I really would check disk io.

Answer 1 above is a way to speed up query 1. To speed up query 2 you may need to create a aggregate table which get updated with every hit or which gets updated in a batch run nightly and then you can just add in the days hits that have not yet been aggregated. An index on the date rage should make this relatively quick.

You should also run "explain" against your query and see what indexes it is using if any. What storage enigne are you using for mysql? This can also have an impact. If you are using MYISAM storage engine and doing inserts and reads concurrently this can have a big performance hit.

Make sure your table stats are updated by running "Analyse" against the larger tables on a regular basis. This helps the query engine select the optimal query plan.

mxc
I am using MyISAM, what do you suggest I change it to?
Garrett
Also to note, I have this in my my.cnf file: skip-external-locking and skip-locking
Garrett
+4  A: 

It seems your campaign_id index has low selectivity, i. e. there are lots of records with this value.

Ordering so many record takes a lot of time.

Try to use INDEX SCAN on the PRIMARY KEY for ordering:

/* Edited, as MySQL does not use live feed from the derived source with ORDER BY */
SELECT *
FROM hits
WHERE IFNULL(campaign_id, campaing_id) = 30
ORDER BY id DESC
LIMIT 10;

As for your second query, there is not much that may be done, as you need a complete scan on whole campaign_id = 30 anyway, be it TABLE SCAN or INDEX SCAN.

In fact, the TABLE SCAN can be even faster:

SELECT count(DISTINCT(ip_address)) AS count_distinct_ip_address
FROM `hits`
WHERE IFNULL(campaign_id, campaign_id)  = 30;

If it is not, you may create an index on (campaign_id, ip_address) and use a trick to imitate INDEX GROUP BY on this index:

CREATE INDEX ix_hits_campaign_ip ON hits(campaign_id, ip_address)

SELECT SUM(cnt)
FROM (
SELECT CASE WHEN @r = ip_address THEN 0 ELSE 1 END AS cnt,
  @r := ip_address
FROM
  (SELECT @r:='') r,
  (
  SELECT ip_address
  FROM hits
  WHERE campaign_id = 30
  ORDER BY ip_address
  ) i
) o

The trick here is simple: we don't need the result, just a count, so there is no need in scanning for actual values. Index scan will suffice.

Unfortunately, despite what MySQL documentation says here on loose index scans, they do not actually work on composite indices. That's why we need to imitate an INDEX SCAN WITH GROUP BY.

We do it by forcing MySQL to use INDEX RANGE SCAN that retrieves all records with campaign_id = 30 sorted by ip_address. Then we count DISTINCT ip_address'es using a session variable @r initialized to an empty string in the first subquery.

In the first field we set the variable to 0 when the previous ip_address (stored in the variable) equals to the current one; otherwise we set it to 1. In the second field we assign the current value of ip_address to the variable.

Finally we retrieve the SUM on the first field which will of course give us COUNT (DISTINCT ip_address).

Quassnoi
These queries took longer than the queries above, I am just going to have to mess around some more.
Garrett
What percent of your rows has campaign_id = 30?
Quassnoi
See updated post, I fixed the queries a little.
Quassnoi
I appreciate it! It took 0.00 seconds compared to 0.03 so I can see some performance improvement for sure.
Garrett
Care to explain the trick in your last query in more detail? I'd be interested in how it works.
Tomalak
See updated post.
Quassnoi
A: 

Just a guess.

SELECT * FROM hits WHERE (campaign_id = 30 AND id > 0) ORDER BY id DESC LIMIT 10;

Hopefully, MySQL will merge the indexes. Good luck.

+1  A: 

You need to use EXPLAIN to find out how it's executing your queries. You need to do it on production or production-like data, but obviously shouldn't do it on a production system (you need to use identical software on development and production for this exercise, of course) - the above would suggest it's doing a full table scan; this is likely to be because there either aren't any indexes it could use, or it's choosing not to use them because they have low cardinality etc.

You then need to evaluate what indexes could be added to improve it, try adding them, test it again, then try to QA the change by checking that adding the index won't break anything else in your application and doesn't regress performance elsewhere. You will want to analyse space and performance impact - again this can be done with production-like data on your test system (performance testing needs to be done on production-spec hardware of course).

Once you're sure adding the indexes is the right thing to do, you can roll those changes into a software release as you normally would. Beware of ALTER TABLE on large tables though, it can take some time and will block writes to the table (120k rows is probably not a large table however). Be sure you know how long it's going to take and what impact it will have on production before rolling the changes out.

MarkR