views:

33

answers:

1

I'm running the following query in MYSQL

select distinct straight_join
   cu.entryid entryid,
   t0.tokpos starting_position,
   t3.tokpos ending_position,
   t0.idxsent idxsent,
   'TOKENS_44_340' tablename
from
   TOKENS_44_340 t0,
   constraints_appraisal cu,
   TOKENS_44_340 t1,
   TOKENS_44_340 t2,
   TOKENS_44_340 t3
where
t0.token_surface = cu.token_0
and (cu.pos_0 is null OR t0.penntag like concat(cu.pos_0,'%'))
and t1.token_surface = cu.token_1
and (cu.pos_1 is null OR t1.penntag like concat(cu.pos_1,'%'))
and t2.token_surface = cu.token_2
and (cu.pos_2 is null OR t2.penntag like concat(cu.pos_2,'%'))
and t3.token_surface = cu.token_3
and (cu.pos_3 is null OR t3.penntag like concat(cu.pos_3,'%'))
and t0.tokpos = t1.tokpos - 1
and t1.tokpos = t2.tokpos - 1
and t2.tokpos = t3.tokpos - 1
and cu.token_4 is null
and cu.token_5 is null
and cu.token_6 is null
and cu.token_7 is null
and cu.token_8 is null
and cu.token_9 is null;

MySQL gives me the following query plan for this query:

+----+-------------+-------+------+---------------------------------------+------------------------+---------+------+------+-----------------+
| id | select_type | table | type | possible_keys                         | key                    | key_len | ref  | rows | Extra           |
+----+-------------+-------+------+---------------------------------------+------------------------+---------+------+------+-----------------+
|  1 | SIMPLE      | t0    | ALL  | PRIMARY,TOKENS_44_340_index_44,tokpos | NULL                   | NULL    | NULL |   49 | Using temporary | 
|  1 | SIMPLE      | cu    | ALL  | NULL                                  | NULL                   | NULL    | NULL | 7907 | Using where     | 
|  1 | SIMPLE      | t1    | ref  | PRIMARY,TOKENS_44_340_index_44,tokpos | TOKENS_44_340_index_44 | 399     | func |    4 | Using where     | 
|  1 | SIMPLE      | t2    | ref  | PRIMARY,TOKENS_44_340_index_44,tokpos | TOKENS_44_340_index_44 | 399     | func |    4 | Using where     | 
|  1 | SIMPLE      | t3    | ref  | TOKENS_44_340_index_44                | TOKENS_44_340_index_44 | 399     | func |    4 | Using where     | 
+----+-------------+-------+------+---------------------------------------+------------------------+---------+------+------+-----------------+
5 rows in set (0.00 sec)

As you can see, MySQL doesn't even so much as acknowledge the existance of my index token_0 on constraints_appraisal(token_0). Any idea why it's ignoring my index, and what I can do about it? I'm running mysql 5.0.51a-24+lenny4 for Debian stable.

P.S. I know I could make this query run faster by removing the straight_join constraint and let it use the token_surface index on t0, but it still wouldn't be as fast as it could be using the token_0 index on constraints_appraisal. I added the straight_join so that I could make my specific issue appear more clearly, and I plan remove it when I have the index working properly.

mysql> describe TOKENS_44_340;
+---------------+--------------+------+-----+---------+-------+
| Field         | Type         | Null | Key | Default | Extra |
+---------------+--------------+------+-----+---------+-------+
| tokPos        | int(11)      | NO   | PRI | NULL    |       | 
| linePos       | int(11)      | YES  |     | NULL    |       | 
| EOLs          | int(11)      | YES  |     | NULL    |       | 
| idxsent       | int(11)      | YES  |     | NULL    |       | 
| possent       | int(11)      | YES  |     | NULL    |       | 
| brilltag      | int(11)      | YES  |     | NULL    |       | 
| token_surface | varchar(132) | YES  | MUL | NULL    |       | 
| wordLen       | int(11)      | YES  |     | NULL    |       | 
| capitalized   | int(11)      | YES  |     | NULL    |       | 
| wordType      | int(11)      | YES  |     | NULL    |       | 
| numDigit      | int(11)      | YES  |     | NULL    |       | 
| numPunc       | int(11)      | YES  |     | NULL    |       | 
| numAlpha      | int(11)      | YES  |     | NULL    |       | 
| maxRep        | int(11)      | YES  |     | NULL    |       | 
| pre1          | varchar(132) | YES  |     | NULL    |       | 
| pre2          | varchar(132) | YES  |     | NULL    |       | 
| pre3          | varchar(132) | YES  |     | NULL    |       | 
| pre4          | varchar(132) | YES  |     | NULL    |       | 
| suf1          | varchar(132) | YES  |     | NULL    |       | 
| suf2          | varchar(132) | YES  |     | NULL    |       | 
| suf3          | varchar(132) | YES  |     | NULL    |       | 
| suf4          | varchar(132) | YES  |     | NULL    |       | 
| dep_gov       | int(11)      | YES  | MUL | NULL    |       | 
| dep_rel       | varchar(20)  | YES  | MUL | NULL    |       | 
| penntag       | varchar(30)  | YES  |     | NULL    |       | 
+---------------+--------------+------+-----+---------+-------+
25 rows in set (0.04 sec)

mysql> describe constraints_appraisal;
+---------------+--------------+------+-----+---------+-------+
| Field         | Type         | Null | Key | Default | Extra |
+---------------+--------------+------+-----+---------+-------+
| entryid       | int(11)      | NO   | PRI | 0       |       | 
| context       | varchar(50)  | YES  |     | NULL    |       | 
| syntax        | int(11)      | YES  |     | NULL    |       | 
| token_0       | varchar(50)  | YES  | MUL | NULL    |       | 
| pos_0         | varchar(50)  | YES  |     | NULL    |       | 
| porter_0      | varchar(50)  | YES  | MUL | NULL    |       | 
| token_1       | varchar(50)  | YES  |     | NULL    |       | 
| pos_1         | varchar(50)  | YES  |     | NULL    |       | 
| porter_1      | varchar(50)  | YES  |     | NULL    |       | 
| token_2       | varchar(50)  | YES  |     | NULL    |       | 
| pos_2         | varchar(50)  | YES  |     | NULL    |       | 
| porter_2      | varchar(50)  | YES  |     | NULL    |       | 
| token_3       | varchar(50)  | YES  |     | NULL    |       | 
| pos_3         | varchar(50)  | YES  |     | NULL    |       | 
| porter_3      | varchar(50)  | YES  |     | NULL    |       | 
| token_4       | varchar(50)  | YES  |     | NULL    |       | 
| pos_4         | varchar(50)  | YES  |     | NULL    |       | 
| porter_4      | varchar(50)  | YES  |     | NULL    |       | 
| token_5       | varchar(50)  | YES  |     | NULL    |       | 
| pos_5         | varchar(50)  | YES  |     | NULL    |       | 
| porter_5      | varchar(50)  | YES  |     | NULL    |       | 
| token_6       | varchar(50)  | YES  |     | NULL    |       | 
| pos_6         | varchar(50)  | YES  |     | NULL    |       | 
| porter_6      | varchar(50)  | YES  |     | NULL    |       | 
| token_7       | varchar(50)  | YES  |     | NULL    |       | 
| pos_7         | varchar(50)  | YES  |     | NULL    |       | 
| porter_7      | varchar(50)  | YES  |     | NULL    |       | 
| token_8       | varchar(50)  | YES  |     | NULL    |       | 
| pos_8         | varchar(50)  | YES  |     | NULL    |       | 
| porter_8      | varchar(50)  | YES  |     | NULL    |       | 
| token_9       | varchar(50)  | YES  |     | NULL    |       | 
| pos_9         | varchar(50)  | YES  |     | NULL    |       | 
| porter_9      | varchar(50)  | YES  |     | NULL    |       | 
| token_surface | varchar(200) | YES  |     | NULL    |       | 
| fileid        | varchar(100) | YES  |     | NULL    |       | 
+---------------+--------------+------+-----+---------+-------+
35 rows in set (0.06 sec)

mysql> show index from constraints_appraisal;
+-----------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table                 | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-----------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| constraints_appraisal |          0 | PRIMARY  |            1 | entryid     | A         |        7907 |     NULL | NULL   |      | BTREE      |         | 
| constraints_appraisal |          1 | token_0  |            1 | token_0     | A         |        NULL |     NULL | NULL   | YES  | BTREE      |         | 
| constraints_appraisal |          1 | porter_0 |            1 | porter_0    | A         |        NULL |     NULL | NULL   | YES  | BTREE      |         | 
+-----------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
3 rows in set (0.05 sec)
A: 

It appears that the problem is that the two tables are in different character sets.

  • TOKENS_44_340 is in utf8
  • constraints_appraisal is in latin1
Ken Bloom