views:

35

answers:

2

I have the following tables (removed columns that aren't used for my examples):

CREATE TABLE `person` (
  `id` int(11) NOT NULL,
  `name` varchar(1024) NOT NULL,
  `sortname` varchar(1024) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `sortname` (`sortname`(255)),
  KEY `name` (`name`(255))
);

CREATE TABLE `personalias` (
  `id` int(11) NOT NULL,
  `person` int(11) NOT NULL,
  `name` varchar(1024) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `person` (`person`),
  KEY `name` (`name`(255))
)

Currently, I'm using this query which works just fine:

select p.* from person p where name = 'John Mayer' or sortname = 'John Mayer';

mysql> explain select p.* from person p where name = 'John Mayer' or sortname = 'John Mayer';
+----+-------------+-------+-------------+---------------+---------------+---------+------+------+----------------------------------------------+
| id | select_type | table | type        | possible_keys | key           | key_len | ref  | rows | Extra                                        |
+----+-------------+-------+-------------+---------------+---------------+---------+------+------+----------------------------------------------+
|  1 | SIMPLE      | p     | index_merge | name,sortname | name,sortname | 767,767 | NULL |    3 | Using sort_union(name,sortname); Using where | 
+----+-------------+-------+-------------+---------------+---------------+---------+------+------+----------------------------------------------+
1 row in set (0.00 sec)

Now I'd like to extend this query to also consider aliases.

First, I've tried using a join:

select p.* from person p join personalias a on p.id = a.person where p.name = 'John Mayer' or p.sortname = 'John Mayer' or a.name = 'John Mayer';

mysql> explain select p.* from person p join personalias a on p.id = a.person where p.name = 'John Mayer' or p.sortname = 'John Mayer' or a.name = 'John Mayer';
+----+-------------+-------+--------+-----------------------+---------+---------+-------------------+-------+-----------------+
| id | select_type | table | type   | possible_keys         | key     | key_len | ref               | rows  | Extra           |
+----+-------------+-------+--------+-----------------------+---------+---------+-------------------+-------+-----------------+
|  1 | SIMPLE      | a     | ALL    | ref,name              | NULL    | NULL    | NULL              | 87401 | Using temporary | 
|  1 | SIMPLE      | p     | eq_ref | PRIMARY,name,sortname | PRIMARY | 4       | musicbrainz.a.ref |     1 | Using where     | 
+----+-------------+-------+--------+-----------------------+---------+---------+-------------------+-------+-----------------+
2 rows in set (0.00 sec)

This looks bad: no index, 87401 rows, using temporary. Using temporary only appears when I use distinct, but as an alias might be the same as the name, I can't really get rid of it.

Next, I've tried to replace the join with a subquery:

select p.* from person p where p.name = 'John Mayer' or p.sortname = 'John Mayer' or p.id in (select person from personalias a where a.name = 'John Mayer');

mysql> explain select p.* from person p where p.name = 'John Mayer' or p.sortname = 'John Mayer' or p.id in (select id from personalias a where a.name = 'John Mayer');
+----+--------------------+-------+----------------+------------------+--------+---------+------+--------+-------------+
| id | select_type        | table | type           | possible_keys    | key    | key_len | ref  | rows   | Extra       |
+----+--------------------+-------+----------------+------------------+--------+---------+------+--------+-------------+
|  1 | PRIMARY            | p     | ALL            | name,sortname    | NULL   | NULL    | NULL | 540309 | Using where | 
|  2 | DEPENDENT SUBQUERY | a     | index_subquery | person,name      | person | 4       | func |      1 | Using where | 
+----+--------------------+-------+----------------+------------------+--------+---------+------+--------+-------------+
2 rows in set (0.00 sec)

Again, this looks pretty bad: no index, 540309 rows. Interestingly, both queries (select p.* from person ... or p.id in (4711,12345) and select id from personalias a where a.name = 'John Mayer') work extremely well.

Why doesn't MySQL use any indices for both of my queries? What else could I do? Currently, it looks best to fetch person.ids for aliases and add them statically as an in(...) to the second query. There certainly has to be another way to do this with a single query. I'm currently out of ideas though. Could I somehow force MySQL into using another (better) query plan?

+3  A: 

In the first query, there is only one table.

MySQL uses the index merge: it takes the row pointers from two indexes and unions them.

You second query introduces another table. MySQL cannot combine the index from another table since the record pointers are different.

You need to emulate this:

SELECT  p.*
FROM    (
        SELECT  id
        FROM    person p
        WHERE   p.name = 'John Mayer'
                OR p.sortname = 'John Mayer'
        UNION
        SELECT  person
        FROM    personalias a
        WHERE   a.name = 'John Mayer'
        ) q
JOIN    person p
ON      p.id = q.id

If your tables are MyISAM, include id as a trailing column to indexes:

CREATE INDEX ix_person_name_id ON (name, id);
CREATE INDEX ix_person_sortname_id ON (sortname, id);
CREATE INDEX ix_personalias_name_person (name, person);

Also note that for the queries like this, it's better to use FULLTEXT indexes:

CREATE FULLTEXT INDEX fx_person_name_sortname ON person (name, sortname);

SELECT  p.*
FROM    (
        SELECT  id
        FROM    person p
        WHERE   MATCH (name, sortname) AGAINST ('"John Mayer"' IN BOOLEAN MODE)
        UNION
        SELECT  person
        FROM    personalias a
        WHERE   a.name = 'John Mayer'
        ) q
JOIN    person p
ON      p.id = q.id
Quassnoi
+1: nice answer!
RedFilter
+1 thanks for your answer, works very well. Too bad I can't accept two answers.
sfussenegger
+1  A: 

Try:

SELECT p.* from person p 
WHERE  p.name = 'John Mayer' or p.sortname = 'John Mayer' 

UNION

SELECT p.* from person p, personalias a 
WHERE  p.id =a.person and a.name = 'John Mayer'

UNION will take care of distinctness.

DVK
+1 thanks, works like a charm
sfussenegger