tags:

views:

128

answers:

5

This query is really slow. Takes between 9 and 10 seconds...

SELECT DISTINCT a.*
FROM addresses a
LEFT JOIN contacts c
ON c.id = a.contact_id
LEFT JOIN organizations o
ON o.id = a.organization_id
ORDER BY c.last_name, c.first_name, o.name
LIMIT 0, 24

If I comment out the ORDER BY clause the query runs much faster -- about 5 milliseconds. But I need the ORDER BY to support paging of the search results. And the users need the addresses to be sorted by contact and organization.


Table structure

addresses
---------
id int NOT NULL
contact_id int       # could be NULL
organization_id int  # could be NULL

contacts
--------
id int NOT NULL
first_name varchar(255)
last_name varchar(255)

organizations
-------------
id int NOT NULL
name varchar(255)

They're all InnoDB tables.

I have these indexes on the contacts table:

  KEY `idx_contacts_first_name` (`first_name`),
  KEY `idx_contacts_last_name` (`last_name`),
  KEY `idx_contacts_first_name_last_name` (`first_name`,`last_name`)

And on the organizations table:

  KEY `idx_organization_name` (`name`)


Amount of data

Addresses:     22,271
Contacts:      17,906
Organizations:  8,246


DESCRIBE output

mysql> DESCRIBE
    -> SELECT DISTINCT a.*
    -> FROM addresses a
    -> LEFT JOIN contacts c
    -> ON c.id = a.contact_id
    -> LEFT JOIN organizations o
    -> ON o.id = a.organization_id
    -> ORDER BY c.last_name, c.first_name, o.name
    -> LIMIT 0, 24;
+----+-------------+-------+--------+---------------+---------+---------+--------------------------------------------+-------+---------------------------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref                                        | rows  | Extra                           |
+----+-------------+-------+--------+---------------+---------+---------+--------------------------------------------+-------+---------------------------------+
|  1 | SIMPLE      | a     | ALL    | NULL          | NULL    | NULL    | NULL                                       | 22387 | Using temporary; Using filesort | 
|  1 | SIMPLE      | c     | eq_ref | PRIMARY       | PRIMARY | 4       | contactdb_v2_development.a.contact_id      |     1 | Distinct                        | 
|  1 | SIMPLE      | o     | eq_ref | PRIMARY       | PRIMARY | 4       | contactdb_v2_development.a.organization_id |     1 | Distinct                        | 
+----+-------------+-------+--------+---------------+---------+---------+--------------------------------------------+-------+---------------------------------+
3 rows in set (0.00 sec)
+1  A: 

If you're not too resource-constrained on the server side and this thing isn't going to scale up too far, you don't have a lot of data so you could simply do your ordering and paging at that level.

Mike Burton
What do you mean by "do your ordering and paging at that level."? Are you proposing to do the sorting in application code after retrieving the data from the DB?
sleske
Yes. For a ~20,000 row data set that's probably not going to change that much, it would make sense to pull the data, sort it, and cache it.
Mike Burton
Yes, I think I will use something like this. It may seem weird to get all the records and offer them for paging, but not doing so would lead to painful UI complexities.
Ethan
+1  A: 

Try adding this index:

idx_contacts_last_name_first_name (last_name,first_name)

BTW: you can delete idx_contacts_first_name since it is duplicative and if you add this index you can delete idx_contacts_last_name.

Matt Wrock
+1  A: 

Try changing your SQL to something like the following:

SELECT a.column1, a.column2, ...
FROM addresses a
LEFT JOIN contacts c
ON c.id = a.contact_id
LEFT JOIN organizations o
ON o.id = a.organization_id
GROUP BY a.column1, a.column2, ...
ORDER BY c.last_name, c.first_name, o.name
LIMIT 0, 24

I've found GROUP BY to be much faster than DISTINCT in general, though I don't why that would be.

Kenny Evitt
+2  A: 

I tried your example, with similar amounts of data, and on my lowly laptop (Pentium M 1,7 GHz) the query takes less than a second (on first run, later runs even less).

Did you just by chance forget the PK on the id column? You don't mention it, so just asking... if you forget that, performance will obviously be horrible - not to mention that every DBA will cringe at tables without a PK.

Otherwise, try this:

DESCRIBE <your query>

This will give you MySQL's query plan. Post that (edit your question), and it should be clearer what's taking so long.

On further thought:

The query will always have problematic performance, because you are asking the database to read and sort all addresses and display them. The ORDER BY means it has to read everything before giving anything back, so it'll always be slow. What is even the point of diplaying the entire database like this? Will users page through several thousand records?

Consider e.g. allowing a search query. With a WHERE condition the query will be much faster.

sleske
Checked PKs. They're set. I pasted the DESCRIBE output. The application does provide a search form (more precisely, a filtering form). The all-Addresses listing is the default state when no filtering parameters are entered. It would be awkward, though not impossible, to have something other than all Addresses be the default state. I'm considering that option, though I'm puzzled that it runs faster on your machine. If I could show "all Addresses" that would be the least confusing for the users.
Ethan
A: 

Let's see.

  • Addresses: 22,271
  • Contacts: 17,906
  • Organizations: 8,246

addresses a LEFT JOIN contacts c gives about 20,000 * 20,000 ~ 400 million comparisons for about 20,000 results

LEFT JOIN organizations gives about 10,000 * 20,000 ~ 200 million comparisons for about 20,000 results

which we sort mainly on contact rows, then discard all but 24 of them. Seems the distinctness of the addresses is of minimal importance.

Since we're mostly sorting by contacts, how about we do a subselect on contacts, keeping somewhat more (say, by a factor of about 4) than we need:

SELECT * FROM contacts ORDER BY last_name, first_name LIMIT 100

Then Join those to their addresses keeping the top hundred or so

     SELECT a.* 
       FROM (SELECT * FROM contacts ORDER BY last_name, first_name LIMIT 0, 100) AS c
  LEFT JOIN addresses a
         ON c.id = a.contact_id
      LIMIT 0, 100

Then join these to the organizations

 SELECT * 
   FROM (
        SELECT * 
          FROM (SELECT * FROM contacts ORDER BY last_name, first_name LIMIT 0, 100) AS c
     LEFT JOIN addresses a
            ON c.id = a.contact_id
         LIMIT 0, 100
         ) AS ca LEFT JOIN organizations o
      ON o.id = ca.organization_id
ORDER BY ca.last_name, ca.first_name, o.name
   LIMIT 0, 24

I'm sure the syntax is screwed up, but I'm equally sure the principle of cutting down the results set at each stage points an instructive way. I have probably made a couple of trade offs too, such that the result closely approximates the 10 seconds answer, but gets there much quicker.

Ewan Todd