views:

236

answers:

9

I have the following query:

SELECT DISTINCT c.id
FROM clients AS c
LEFT JOIN client_project AS cp ON (cp.client_id = c.id)
WHERE cp.project_id = 1
    AND c.active_flag = 1
ORDER BY c.client_name

If I remove the order by, the query takes 0.005 seconds. With the order by, the query takes 1.8-1.9 seconds. I have an index on client_name.

What else would improve the speed?

Edit: c.id is primary key, but there could be multiple records for it in client_project and therefore it may result in more than one record for each id. Also, removing the distinct makes 0.1 second difference in the query.

Addition: Here is my clients table:

CREATE TABLE IF NOT EXISTS `clients` (
  `id` int(11) NOT NULL auto_increment,
...
  `organization` varchar(255) character set utf8 collate utf8_bin NOT NULL,
  `client_name` varchar(255) character set utf8 collate utf8_bin NOT NULL,
  `active_flag` tinyint(1) NOT NULL,
...
  PRIMARY KEY  (`id`),
  KEY `active_flag` (`active_flag`),
...
  KEY `organization` (`organization`),
  KEY `client_name` (`client_name`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

Using MySQL 5.0

A: 

Is c.id an identity column? If it is, I don't think you'll need the DISTINCT in there since each c.id will be unique.

EDIT

So c.id may have multiple entries in cp even if cp.project_id = 1?

EDIT

Just curios as to why you want to order by client name when you are not selecting it.

Eppz
-1: this would be correct if there was no JOIN, but it is quite possible there is more than one client_project per client
Greg
A: 

Is c.id a primary key? If so, you shouldn't do a DISTINCT on it because it's already distinct, and forcing the DISTINCT may cause it to sort by id then sort by client_name.

Paul Tomblin
A: 

why are you ordering by client name if you are not even returning it?

Also do you need the distinct?

Why are you doing a left join if your where clause will make it a inner join anyway

By having this WHERE cp.project_id = 1 instead of AND cp.project_id = 1 before the WHERE clause it is a INNEr JOIN anyway

SQLMenace
A: 

Some optimizations are DB vendor neutral while others are DB vendor specific. Here's a couple of things to try.

  • Remove the DISTINCT as is suggested elsewhere.
  • Consider using an inner join. I realize that it may not be a viable option in your situation.

Also, run an execution plan to get a better picture of what is going on in terms of what parts of the query are taking up the most time and why. See the EXPLAIN keyword for more details.

Glenn
By having this WHERE cp.project_id = 1 instead of AND cp.project_id = 1 before the WHERE clause it is a INNEr JOIN anyway
SQLMenace
@SQLMenace: that's true if your query optimizer/planner is intelligent. I know for a fact that in many situations like this, MSSQL will do a cartesian join and then filter the results. IMHO, you should always put restrictions on the join in the join condition just in case.
rmeador
+2  A: 

looking at your edits

try using EXISTS in that case

SELECT  c.id
FROM clients AS c
WHERE EXISTS (SELECT * FROM  client_project AS cp  
              WHERE cp.client_id = c.id and cp.project_id = 1)
AND c.active_flag = 1
SQLMenace
You're right, but that still doesn't explain why the order by makes things so slow ...
IronGoofy
because the table order is in the order of your PK, so if you order by client_name it has do do a bookmark lookup to do the sort order what does the plan look like?
SQLMenace
+1  A: 

Probably there are indices on clients.id and clients.active_flag, so there is no need for the optimizer to go to the full table (or the additional index) unless you want to sort on it.

Check the optimizer plan, I think in mySQL it is explain .

An index on client_name, id may help (or it may not - check the plan).

A couple of additonal questions/ideas/remarks that may help ...

  • Why order by name if all you get from the select is the id
  • Why do a left join if you have a where clause of "cp.project_id", so clients without a project won't be returned anyways
  • As to the other posters (paul, eppz), "distinct" may be needed for clients with more than one project. So another idea would be to do something like

    select id from clients c where exists (select * from client_project cp where c.id = cp.client_id)

IronGoofy
I'm not sure if this is true in this case, but if cp.project_id is a unique field, then the distinct would not be required since it clearly states cp.project_id = 1 in the where clause.
Eppz
okay, you're right .. it's a sensible assumption that project_id would be unique, and then the distinct isn't necessary.
IronGoofy
+1  A: 

Try adding this key to client_projects:

KEY(client_name, id, active_flag)
too much php
A: 

You'll need to force usage of index on client_name:

SELECT id
FROM (
  SELECT c.id,
    (
    SELECT 1
    FROM client_projects cp
    WHERE cp.client_id = c.id
      AND cp.project_id = 1
    LIMIT 1
    ) e
FROM clients c
FORCE INDEX (client_name)
WHERE c.active_flag = 1
ORDER BY
  client_name
) co
WHERE e IS NOT NULL
Quassnoi
A: 

I don't have a solution for you, but I do have an explanation.

MySQL only uses a single index per table. You have two tables, and the indexes used in those are the Primary Key of one (WHERE cp.project_id = 1) and the join is forcing the use of the second table index to efficiently join.

Using ORDER BY after that it therefore becomes impossible for MySQL to use an index to order. Adding more indexes will not help. EXPLAIN will show you which indexes MySQL has chosen to use for each table. Forcing an index will cause the other parts of the query to slow down.

mluebke