views:

84

answers:

2

This is my tables, the AUTO_INCREMENT shows the size of each:

tbl_clientes:

CREATE TABLE `tbl_clientes` (
  `int_clientes_id_pk` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `str_clientes_documento` varchar(255) DEFAULT NULL,
  `str_clientes_nome_original` char(255) DEFAULT NULL,
  PRIMARY KEY (`int_clientes_id_pk`),
  UNIQUE KEY `str_clientes_documento` (`str_clientes_documento`),
  KEY `str_clientes_nome_original` (`str_clientes_nome_original`),
  KEY `nome_original_cliente_id` (`str_clientes_nome_original`,`int_clientes_id_pk`),
  KEY `cliente_id_nome_original` (`int_clientes_id_pk`,`str_clientes_nome_original`)
) ENGINE=MyISAM AUTO_INCREMENT=2815520 DEFAULT CHARSET=utf8

tbl_clienteEnderecos:

CREATE TABLE `tbl_clienteEnderecos` (
  `int_clienteEnderecos_id_pk` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `int_clienteEnderecos_cliente_id_fk` bigint(20) unsigned NOT NULL,
  `str_clienteEnderecos_endereco` varchar(255) NOT NULL,
  `str_clienteEnderecos_cep` varchar(255) DEFAULT NULL,
  `str_clienteEnderecos_numero` varchar(255) DEFAULT NULL,
  `str_clienteEnderecos_complemento` varchar(255) DEFAULT NULL,
  `str_clienteEnderecos_bairro` varchar(255) DEFAULT NULL,
  `str_clienteEnderecos_cidade` varchar(255) DEFAULT NULL,
  `str_clienteEnderecos_uf` varchar(2) DEFAULT NULL,
  `int_clienteEnderecos_correspondencia` tinyint(1) NOT NULL DEFAULT '0',
  `int_clienteEnderecos_tipo` int(11) NOT NULL DEFAULT '1',
  PRIMARY KEY (`int_clienteEnderecos_id_pk`),
  KEY `int_clienteEnderecos_cliente_id_fk` (`int_clienteEnderecos_cliente_id_fk`),
  KEY `str_clienteEnderecos_cidade` (`str_clienteEnderecos_cidade`),
  KEY `str_clienteEnderecos_uf` (`str_clienteEnderecos_uf`),
  KEY `uf_cidade` (`str_clienteEnderecos_uf`,`str_clienteEnderecos_cidade`)
) ENGINE=MyISAM AUTO_INCREMENT=1542038 DEFAULT CHARSET=utf8

Then I run this query to search, it will be fast, is using indexes:

EXPLAIN
SELECT * FROM tbl_clientes LEFT JOIN tbl_clienteEnderecos ON int_clienteEnderecos_cliente_id_fk = int_clientes_id_pk
GROUP BY str_clientes_nome_original, int_clientes_id_pk
ORDER BY str_clientes_nome_original, int_clientes_id_pk
LIMIT 0,20

The result of EXPAIN is:

| id | select_type | table                | type  | possible_keys                      | key                                | key_len | ref                                               | rows | Extra |
+----+-------------+----------------------+-------+------------------------------------+------------------------------------+---------+---------------------------------------------------+------+-------+
|  1 | SIMPLE      | tbl_clientes         | index | NULL                               | nome_original_cliente_id           | 774     | NULL                                              |   20 |       |
|  1 | SIMPLE      | tbl_clienteEnderecos | ref   | int_clienteEnderecos_cliente_id_fk | int_clienteEnderecos_cliente_id_fk | 8       | mydb.tbl_clientes.int_clientes_id_pk |    1 |       |
+----+-------------+----------------------+-------+------------------------------------+------------------------------------+---------+---------------------------------------------------+------+-------+

All right, but I need to filter by tbl_clienteEnderecos.str_clienteEnderecos_uf. It breaks all indexes, use temporary table and filesort (no index). Here's the query:

EXPLAIN
SELECT * FROM tbl_clientes LEFT JOIN tbl_clienteEnderecos ON int_clienteEnderecos_cliente_id_fk = int_clientes_id_pk
WHERE str_clienteEnderecos_uf = "SP"
GROUP BY str_clientes_nome_original, int_clientes_id_pk
ORDER BY str_clientes_nome_original, int_clientes_id_pk
LIMIT 0,20

Look, this is the output of EXPLAIN:

| id | select_type | table                | type   | possible_keys                                                        | key       | key_len | ref                                                                       | rows   | Extra                                        |
+----+-------------+----------------------+--------+----------------------------------------------------------------------+-----------+---------+---------------------------------------------------------------------------+--------+----------------------------------------------+
|  1 | SIMPLE      | tbl_clienteEnderecos | ref    | int_clienteEnderecos_cliente_id_fk,str_clienteEnderecos_uf,uf_cidade | uf_cidade | 9       | const                                                                     | 670654 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | tbl_clientes         | eq_ref | PRIMARY,cliente_id_nome_original                                     | PRIMARY   | 8       | mydb.tbl_clienteEnderecos.int_clienteEnderecos_cliente_id_fk |      1 |                                              |
+----+-------------+----------------------+--------+----------------------------------------------------------------------+-----------+---------+---------------------------------------------------------------------------+--------+----------------------------------------------+

With this Using where; Using temporary; Using filesort it can't be fast. I've tried a lot of things, how optimize this query?

Is it time to switch to NoSQL/MongoDB?

A: 

MySQL will typically not use an index if it will not help narrow the results down enough. It appears that "SP" occurs in roughly 670654 rows. Since this is about 1/3 of your total rows, it is more efficient to read it in disk order.

You can try an index to tbl_clienteEnderecos:

KEY `test` (`str_clienteEnderecos_uf `, `int_clienteEnderecos_cliente_id_fk`)

This might be enough to get it to use the index.

What is the difference between these two columns? They look like they should be the same.

int_clienteEnderecos_id_pk
int_clienteEnderecos_cliente_id_fk

------ edit below -----

I understand what the names of the columns imply. I was just curious if the two values should be identical. If they are, it would simplify a few things and have them be joined on the primary key of the tables. I am not sure about the specific meaning of the tables involved, so I don't know if there is a 1-1 or 1-0 relationship between them or a one to many relationship.

I suggest trying to retrieve just the primary key of the tables that you want. For instance, instead of select * try:

EXPLAIN 
SELECT int_clienteEnerecos_id_pk, int_clientes_id_pk 
FROM tbl_clientes 
LEFT JOIN tbl_clienteEnderecos ON int_clienteEnderecos_cliente_id_fk = int_clientes_id_pk
WHERE str_clienteEnderecos_uf = "SP"
GROUP BY str_clientes_nome_original, int_clientes_id_pk
ORDER BY str_clientes_nome_original, int_clientes_id_pk
LIMIT 0,20

If this works out the way I hope it will, you sell see "from index" in the Extra column. If you need additional fields returned, you can either make another round trip to fetch them, or add them to your index. Or use a nested query to fetch them based on the results of the query above.

Also, why are you grouping by and ordering by the same thing? Are you expecting multiple matches of the foreign key?

Cheers, Jacob

TheJacobTaylor
int_clienteEnderecos_id_pk is PRIMARY KEY of tbl_clienteEnderecos table.int_clienteEnderecos_cliente_id_fk is FOREING KEY of tbl_clientes table on tbl_clienteEnderecos table.
Daniel Koch
I've tried your suggestion, to use this KEY (`str_clienteEnderecos_uf `, `int_clienteEnderecos_cliente_id_fk`). Not working, same result: Using where; Using temporary; Using filesort.
Daniel Koch
The short answer is that your filter is basically useless. You don't filter out enough rows for the query to use the indexes. Adding another suggestion to my answer above.
TheJacobTaylor
Hi, I'm getting the same slow result. tbl_clientes can be translated to "customers" table, while tbl_clienteEnderecos can be trabslated as "customer_addresses". int_clienteEnderecos_id_pk is id field, while int_clienteEnderecos_cliente_id_fk is the foreing key: customer_id.
Daniel Koch
Sorry to hear that. In terms of the naming, here is where I was headed with the duplicate column question. If each customer_addresses row has exactly one customer_id, why don't you just use the customer_id as the primary key for the address table? If there can be more than one customer_id associated with an address then you should put the customer_addresses key as a foreign key in the customers table. Not sure how to help on optimizing the query.
TheJacobTaylor
A: 

I'd suggest giving the following a try; the subquery might use the key better than the join in this context. Take care, though; I couldn't swear on a stack of K & R's that the query is the same as your original.

SELECT *,
       (SELECT *
            FROM tbl_clienteEnderecos
            WHERE int_clienteEnderecos_cliente_id_fk = int_clientes_id_pk AND
                  str_clienteEnderecos_uf = "SP") AS T2
    FROM tbl_clientes
    GROUP BY str_clientes_nome_original, int_clientes_id_pk
    HAVING T2.int_clienteEnderecos_id_pk IS NOT NULL
    ORDER BY str_clientes_nome_original, int_clientes_id_pk
    LIMIT 0, 20
Brian Hooper
Hi Brian, this query doesn't work: Query : SELECT *, (SELECT * FROM tbl_clienteEnderecos WHERE int_clienteEnderecos_cliente_id_fk = int_c... Error Code : 1241Operand should contain 1 column(s). Thank you.
Daniel Koch
Hmmm. Yes, you're right. I'll have a think about it.
Brian Hooper
I've thought about it, and been unable to come up with any modification to this query that doesn't turn it into what you were trying anyway. I'm sorry to have taken up your time.
Brian Hooper
No problem, thanks a lot anyway.
Daniel Koch