views:

607

answers:

3

I've been furiously googling trying to figure this out, with surprisingly little luck; I would guess this is a common issue.

I have 5 tables: orders, addresses, notes, transactions, line_items, and shipments.

transactions, addresses and notes all have indexed order_id fields - line_items and shipments have indexed transaction_id fields.

The best single-query performance I've gotten is completely untenable - above 30 secs sometimes. The great and frustrating irony is that I can do this with a big block of PHP code in under 1. For example, I'll iterate through all the notes to match against a given search, saving all of the order_ids in an array. Then I'll do the same for all of the other tables. Then I'll append a massive IN ( ... ) statement on my final query of the orders table. This works fine, but I know I can do better.

The most obvious routes don't work; simply LEFT JOINing all of these tables to the original orders table and GROUPing BY the order.id takes too long - about 9 secs.

For the life of me, I can't see how my janky PHP solution is more efficient that mysql doing all of these calculations internally.

I've rewritten this so many times, I can hardly recall all the different things I've tried... I think this was my first attempt:

SELECT o.id FROM orders o
LEFT JOIN addresses a ON a.order_id = o.id
LEFT JOIN notes n ON (n.parent_id = o.id AND n.type = "parts")
LEFT JOIN transactions t ON t.order_id = o.id
LEFT JOIN line_items li ON li.transaction_id = t.id
LEFT JOIN shipments s ON s.transaction_id = t.id
WHERE 0 = 0
AND ((a.`email` LIKE "%Lachman%" || a.`contact_name` LIKE "%Lachman%" || a.`company_name` LIKE "%Lachman%" || a.`address1` LIKE "%Lachman%" || a.`address2` LIKE "%Lachman%" || a.`country` LIKE "%Lachman%" || a.`city` LIKE "%Lachman%" || a.`region` LIKE "%Lachman%" || a.`postal_code` LIKE "%Lachman%" || n.`note` LIKE "%Lachman%" || t.`g_order_number` LIKE "%Lachman%" || t.`pp_txn_id` LIKE "%Lachman%" || t.`fm_invoice_num` LIKE "%Lachman%" || t.`ebay_item_id` LIKE "%Lachman%" || t.`ebay_buyer_id` LIKE "%Lachman%" || t.`ebay_transaction_id` LIKE "%Lachman%" || t.`ebay_order_id` LIKE "%Lachman%" || li.`partnum` LIKE "%Lachman%" || li.`part_id` LIKE "%Lachman%" || li.`desc` LIKE "%Lachman%" || li.`source` LIKE "%Lachman%" || s.`tracking` LIKE "%Lachman%" || s.`carrier` LIKE "%Lachman%"))
GROUP BY o.id
ORDER BY `created` DESC

2 results 9.6895699501 seconds

I'm not sure how accurate the formatting will be on this, but I'll also attached the EXPLAINation:

id  select_type table type possible_keys key key_len ref rows Extra
1   SIMPLE o ALL NULL NULL NULL NULL 2840 Using temporary; Using filesort
1   SIMPLE a ref order_id order_id 5 apple_components.o.id 1  
1   SIMPLE n ref parent_id,type type 22 const 314  
1   SIMPLE t ref order_id order_id 5 apple_components.o.id 1  
1   SIMPLE li ref transaction_id transaction_id 4 apple_components.t.id 1  
1   SIMPLE s ref transaction_id transaction_id 4 apple_components.t.id 1 Using where

Many, many thanks.

[Edit: for reference, here is the PHP solution that takes ~0.02s -- how can I do this in straight mysql!?]

if ($s['s']) {
    $search_fields = array(
        'a' => array('email', 'contact_name', 'company_name', 'address1', 'address2', 'country', 'city', 'region', 'postal_code'),
        'n' => array('note'),
        't' => array('g_order_number', 'pp_txn_id', 'fm_invoice_num', 'ebay_item_id', 'ebay_buyer_id', 'ebay_transaction_id', 'ebay_order_id'),
        'li' => array('partnum', 'part_id', 'desc', 'source'),
        's' => array('tracking', 'carrier')
    );
    $search_clauses = array();
    foreach ($search_fields as $table => $fields) {
        $the_fields = array();
        foreach ($fields as $field) $the_fields[] = $table.'.`'.$field.'`';
        $clauses = array();
        foreach (explode(' ', $s['s']) as $term) $clauses[] = 'CONCAT_WS(" ", '.implode(', ', $the_fields).') LIKE "%'.$term.'%"';
        $search_clauses[$table] = $clauses;
    }

    $order_ids = array();
    $results = mysql_query('SELECT order_id FROM addresses a WHERE '.implode(' AND ', $search_clauses['a']));
    while ($result = mysql_fetch_assoc($results)) $order_ids[] = $result['order_id'];
    $results = mysql_query('SELECT parent_id FROM notes n WHERE type = "orders" AND '.implode(' AND ', $search_clauses['n']));
    while ($result = mysql_fetch_assoc($results)) $order_ids[] = $result['parent_id'];
    $results = mysql_query('SELECT order_id FROM transactions t WHERE '.implode(' AND ', $search_clauses['t']));
    while ($result = mysql_fetch_assoc($results)) $order_ids[] = $result['order_id'];

    $transaction_ids = array();
    $results = mysql_query('SELECT transaction_id FROM line_items li WHERE '.implode(' AND ', $search_clauses['li']));
    while ($result = mysql_fetch_assoc($results)) $transaction_ids[] = $result['transaction_id'];
    $results = mysql_query('SELECT transaction_id FROM shipments s WHERE '.implode(' AND ', $search_clauses['s']));
    while ($result = mysql_fetch_assoc($results)) $transaction_ids[] = $result['transaction_id'];
    if (count($transaction_ids)) {
        $results = mysql_query('SELECT order_id FROM transactions WHERE id IN ('.implode(', ', $transaction_ids).')');
        while ($result = mysql_fetch_assoc($results)) if (!empty($result['order_id'])) $order_ids[] = $result['order_id'];
    }
}
$query = 'SELECT id FROM orders WHERE id IN ('.implode(', ', $order_ids).')';

2009-10-07: Looking at this again; still haven't found a better solution. The suggestion in the comments to add "FORCE INDEX (PRIMARY)" after "orders o" consistently knocked off a couple of seconds -- but I never really understood why. Also I've since realized there's a limitation in my PHP solution in that searches with multiple terms are only matched within a table instead of across tables.

+2  A: 

The first line of your EXPLAIN jumps out of at me. Do you have your o.id field set as a primary unique key?

Ensuring your keys/indexes are set up properly can reduce your query time by huuuuuge magnitudes (transforming server-crashes into 1-second responses)

Also, I would simplify the comparison logic by doing the LIKE against a CONCAT:

WHERE CONCAT(
  a.email,
  a.contactname,
  ....
) LIKE "%lachman%"
rooskie
I'm not sure I would do that CONCAT. Anytime you do LIKE against a computed value, the server has to go through all the rows one by one. If you do LIKE against the direct rows in the table (and you have some kind of full-text index in the database), the server can check the index instead. (though if there isn't a fulltext index, you're stuck going through all the rows since there's a % at the beginning of the string.)
Jason S
In other words, you've simplified the text of the query, but you've made the actual database search more complicated.
Jason S
Yes, but I'm under the assumption that the LIKE will be nearly instantaneous, it's the joins causing slowdown.
rooskie
I agree the string comparison is cheap compared to database lookup, but maybe I'm missing something; how would a CONCAT speed up a JOIN operation?
Jason S
To clarify: if you do a CONCAT, the server has to basically go through all the orders and grab all the data from all these tables and *then* decide, Oh, that row doesn't match. If you keep the comparisons separate, the query planner can say, Oh, this user doesn't need all the rows from the "addresses" table, I just need to find the ones that match "%Lachman%" and then that can narrow things down really quickly.
Jason S
orders.id is indeed the primary unique key. I tried the CONCAT method just for the sake of it. It's odd: using CONCAT(...) and LIKE "%...%" took ~0.03s, but returned an incomplete recordset (not the right # of matches); but when I use CONCAT_WS(" ", ...), it works fine, but takes the same ~9.6s. I would have figured wrapping the search term with %s would make those expressions functionally equivalent.
JKS
(must have been a NULL value in there -- using IFNULL(), the CONCAT method returns the right recordset, but still takes the exact same ~9.6s.)
JKS
Just to follow up on a couple things:I think I was unclear in stating the intent of the CONCAT( ... ) LIKE .... That was purely for coding style - I think it would be inconsequential (for better or worse) in optimization. I would also amend exactly what you said - CONCAT_WS( ' ', fields ... ) - not only to handle NULL values, but more importantly to prevent the instance of one field ending in 'lach' and the next beginning with 'man'. That being said, try adding 'FORCE INDEX (PRIMARY)' after 'orders o'
rooskie
Using "FORCE INDEX (PRIMARY)" knocks it down to a consistent ~4.9s. hrmmm...
JKS
+1  A: 

Here's your current WHERE clause simplified:

WHERE a.email LIKE "%Lachman%" 
   OR a.contact_name LIKE "%Lachman%" 
   OR a.company_name LIKE "%Lachman%" 
   OR a.address1 LIKE "%Lachman%" 
   OR a.address2 LIKE "%Lachman%" 
   OR a.country LIKE "%Lachman%" 
   OR a.city LIKE "%Lachman%" 
   OR a.region LIKE "%Lachman%" 
   OR a.postal_code LIKE "%Lachman%" 
   OR n.note LIKE "%Lachman%" 
   OR t.g_order_number LIKE "%Lachman%" 
   OR t.pp_txn_id LIKE "%Lachman%" 
   OR t.fm_invoice_num LIKE "%Lachman%" 
   OR t.ebay_item_id LIKE "%Lachman%" 
   OR t.ebay_buyer_id LIKE "%Lachman%" 
   OR t.ebay_transaction_id LIKE "%Lachman%" 
   OR t.ebay_order_id LIKE "%Lachman%" 
   OR li.partnum LIKE "%Lachman%" 
   OR li.part_id LIKE "%Lachman%" 
   OR li.desc LIKE "%Lachman%" 
   OR li.source LIKE "%Lachman%" 
   OR s.tracking LIKE "%Lachman%" 
   OR s.carrier LIKE "%Lachman%"

You need to take serious look at what columns you are looking in - here's my list of those that shouldn't be in the WHERE clause:

  • country
  • city
  • region
  • postalcode
  • pp-txn-id
  • ebay-item-id
  • ebay-transaction-id
  • ebay-order-id
  • partnum
  • part_id
OMG Ponies
Curiously, if I limit the query to just one column per table, it takes an identical amount of time.If I limit it to one column on just two tables, it still takes ~9.6s.But strangest of all, if I change that to just one table/one column, it immediately jumps two orders of magnitude to 0.0934s, irrespective of which table/column I choose.
JKS
+1  A: 

If you're really doing a lot of queries for user-specified strings that are subsets of some of your fields, I would consider looking at creating a full-text index, which MySQL supports for MyISAM tables.

Jason S