Remember to take advantage of Boolean short-circuit evaluation:
SELECT COUNT(*)
FROM messages
join emails ON emails.id = messages.emailid
WHERE ownership = 32 AND message LIKE '%word%'
This filters by ownership
before it evaluates the LIKE
predicate. Always put your cheaper expressions on the left.
Also, I agree with @Martin Smith and @MJB that you should consider using MySQL's FULLTEXT
indexing to make this faster.
Re your comment and additional information, here's some analysis:
explain SELECT COUNT(*) FROM ticket WHERE category IN (1)\G
id: 1
select_type: SIMPLE
table: ticket
type: ref
possible_keys: category
key: category
key_len: 4
ref: const
rows: 1
Extra: Using index
The note "Using index" is a good thing to see because it means it can satisfy the query just by reading the index data structure, not even touching the data of the table. This is certain to run very fast.
explain SELECT COUNT(*) FROM ticket_subject WHERE subject LIKE '%about%'\G
id: 1
select_type: SIMPLE
table: ticket_subject
type: ALL
possible_keys: NULL <---- no possible keys
key: NULL
key_len: NULL
ref: NULL
rows: 1
Extra: Using where
This shows that there are no possible keys that can benefit the wildcard LIKE
predicate. It uses the condition in the WHERE clause, but it has to evaluate it by running a table-scan.
explain SELECT COUNT(*) FROM ticket LEFT JOIN ticket_subject
ON (ticket_subject.ticketid = ticket.id)
WHERE category IN (1)
AND ticket_subject.subject LIKE '%about%'\G
id: 1
select_type: SIMPLE
table: ticket
type: ref
possible_keys: PRIMARY,category
key: category
key_len: 4
ref: const
rows: 1
Extra: Using index
id: 1
select_type: SIMPLE
table: ticket_subject
type: ref
possible_keys: ticketid
key: ticketid
key_len: 4
ref: test.ticket.id
rows: 1
Extra: Using where
Likewise, accessing the ticket table is quick, but that's spoiled by the table-scan incurred by the LIKE
condition.
ALTER TABLE ticket_subject ENGINE=MyISAM;
CREATE FULLTEXT INDEX ticket_subject_fulltext ON ticket_subject(subject);
explain SELECT COUNT(*) FROM ticket JOIN ticket_subject
ON (ticket_subject.ticketid = ticket.id)
WHERE category IN (1) AND MATCH(ticket_subject.subject) AGAINST('about')
id: 1
select_type: SIMPLE
table: ticket
type: ref
possible_keys: PRIMARY,category
key: category
key_len: 4
ref: const
rows: 1
Extra: Using index
id: 1
select_type: SIMPLE
table: ticket_subject
type: fulltext
possible_keys: ticketid,ticket_subject_fulltext
key: ticket_subject_fulltext <---- now it uses an index
key_len: 0
ref:
rows: 1
Extra: Using where
You're never going to make LIKE
perform well. See my presentation Practical Full-Text Search in MySQL.
Re your comment: Okay, I've done some experiments on a dataset of similar size (the Users and Badges tables in the Stack Overflow data dump :-). Here's what I found:
select count(*) from users
where reputation > 50000
+----------+
| count(*) |
+----------+
| 37 |
+----------+
1 row in set (0.00 sec)
That's really fast, because I have an index on the reputation column.
id: 1
select_type: SIMPLE
table: users
type: range
possible_keys: users_reputation_userid_displayname
key: users_reputation_userid_displayname
key_len: 4
ref: NULL
rows: 37
Extra: Using where; Using index
select count(*) from badges
where badges.creationdate like '%06-24%'
+----------+
| count(*) |
+----------+
| 1319 |
+----------+
1 row in set, 1 warning (0.63 sec)
That's as expected, since the table has 700k rows, and it has to do a table-scan. Now let's do the join:
select count(*) from users join badges using (userid)
where users.reputation > 50000 and badges.creationdate like '%06-24%'
+----------+
| count(*) |
+----------+
| 19 |
+----------+
1 row in set, 1 warning (0.03 sec)
That doesn't seem so bad. Here's the explain report:
id: 1
select_type: SIMPLE
table: users
type: range
possible_keys: PRIMARY,users_reputation_userid_displayname
key: users_reputation_userid_displayname
key_len: 4
ref: NULL
rows: 37
Extra: Using where; Using index
id: 1
select_type: SIMPLE
table: badges
type: ref
possible_keys: badges_userid
key: badges_userid
key_len: 8
ref: testpattern.users.UserId
rows: 1
Extra: Using where
This does seem like it's using indexes intelligently for the join, and it helps that I have a compound index including userid and reputation. Remember that MySQL can use only one index per table, so it's important to get define the right compound indexes for the query you need to do.
Re your comment: OK, I've tried this where reputation > 5000, and where reputation > 500, and where reputation > 50. These should match a much larger set of users.
select count(*) from users join badges using (userid)
where users.reputation > 5000 and badges.creationdate like '%06-24%'
+----------+
| count(*) |
+----------+
| 194 |
+----------+
1 row in set, 1 warning (0.27 sec)
select count(*) from users join badges using (userid)
where users.reputation > 500 and badges.creationdate like '%06-24%'
+----------+
| count(*) |
+----------+
| 624 |
+----------+
1 row in set, 1 warning (0.93 sec)
select count(*) from users join badges using (userid)
where users.reputation > 50 and badges.creationdate like '%06-24%'
--------------
+----------+
| count(*) |
+----------+
| 1067 |
+----------+
1 row in set, 1 warning (1.72 sec)
The explain report is the same in all cases, but if the query finds more matching rows in the Users table, then it naturally has to evaluate the LIKE
predicate against a lot more matching rows in the Badges table.
It's true that there is some cost to doing a join. It's a little surprising that it's so dramatically expensive. But this can be mitigated if you use indexes.
I know you said you have a query that can't use an index, but perhaps it's time to consider creating a redundant column with some transformed version of the data of your original column, so you can index it. In the example above, I might create a column creationdate_day
and populate it from DAYOFYEAR(creationdate)
.
Here's what I mean:
ALTER TABLE Badges ADD COLUMN creationdate_day SMALLINT;
UPDATE Badges SET creationdate_day = DAYOFYEAR(creationdate);
CREATE INDEX badge_creationdate_day ON Badges(creationdate_day);
select count(*) from users join badges using (userid)
where users.reputation > 50 and badges.creationdate_day = dayofyear('2010-06-24')
+----------+
| count(*) |
+----------+
| 1067 |
+----------+
1 row in set, 1 warning (0.01 sec) <---- not too shabby!
Here's the explain report:
id: 1
select_type: SIMPLE
table: badges
type: ref
possible_keys: badges_userid,badge_creationdate_day
key: badge_creationdate_day <---- here is our new index
key_len: 3
ref: const
rows: 1318
Extra: Using where
id: 1
select_type: SIMPLE
table: users
type: eq_ref
possible_keys: PRIMARY,users_reputation_userid_displayname
key: PRIMARY
key_len: 8
ref: testpattern.badges.UserId
rows: 1
Extra: Using where