views:

99

answers:

1

About the system: - There are tutors who create classes and packs - A tags based search approach is being followed.Tag relations are created when new tutors register and when tutors create packs (this makes tutors and packs searcheable). For details please check the section How tags work in this system? below.

Following is the concerned query Can anybody help me suggest an approach using temporary tables. We have indexed all the relevant fields and it looks like this is the least time possible with this approach:-

 SELECT SUM(DISTINCT( t.tag LIKE "%Dictatorship%"
                      OR tt.tag LIKE "%Dictatorship%"
                      OR ttt.tag LIKE "%Dictatorship%" )) AS key_1_total_matches
       ,
       SUM(DISTINCT( t.tag LIKE "%democracy%"
                      OR tt.tag LIKE "%democracy%"
                      OR ttt.tag LIKE "%democracy%" ))    AS key_2_total_matches
       ,
       COUNT(DISTINCT( od.id_od ))                        AS
       tutor_popularity,
       CASE
         WHEN ( IF(( wc.id_wc > 0 ), ( wc.wc_api_status = 1
                                       AND wc.wc_type = 0
                                       AND wc.class_date > '2010-06-01 22:00:56'
                                       AND wccp.status = 1
                                       AND ( wccp.country_code = 'IE'
                                              OR wccp.country_code IN ( 'INT' )
                                           ) ), 0)
              ) THEN 1
         ELSE 0
       END                                                AS 'classes_published'
       ,
       CASE
         WHEN ( IF(( lp.id_lp > 0 ), ( lp.id_status = 1
                                       AND lp.published = 1
                                       AND lpcp.status = 1
                                       AND ( lpcp.country_code = 'IE'
                                              OR lpcp.country_code IN ( 'INT' )
                                           ) ), 0)
              ) THEN 1
         ELSE 0
       END                                                AS 'packs_published',
       td . *,
       u . *
FROM   tutor_details AS td
       JOIN users AS u
         ON u.id_user = td.id_user
       LEFT JOIN learning_packs_tag_relations AS lptagrels
         ON td.id_tutor = lptagrels.id_tutor
       LEFT JOIN learning_packs AS lp
         ON lptagrels.id_lp = lp.id_lp
       LEFT JOIN learning_packs_categories AS lpc
         ON lpc.id_lp_cat = lp.id_lp_cat
       LEFT JOIN learning_packs_categories AS lpcp
         ON lpcp.id_lp_cat = lpc.id_parent
       LEFT JOIN learning_pack_content AS lpct
         ON ( lp.id_lp = lpct.id_lp )
       LEFT JOIN webclasses_tag_relations AS wtagrels
         ON td.id_tutor = wtagrels.id_tutor
       LEFT JOIN webclasses AS wc
         ON wtagrels.id_wc = wc.id_wc
       LEFT JOIN learning_packs_categories AS wcc
         ON wcc.id_lp_cat = wc.id_wp_cat
       LEFT JOIN learning_packs_categories AS wccp
         ON wccp.id_lp_cat = wcc.id_parent
       LEFT JOIN order_details AS od
         ON td.id_tutor = od.id_author
       LEFT JOIN orders AS o
         ON od.id_order = o.id_order
       LEFT JOIN tutors_tag_relations AS ttagrels
         ON td.id_tutor = ttagrels.id_tutor
       LEFT JOIN tags AS t
         ON t.id_tag = ttagrels.id_tag
       LEFT JOIN tags AS tt
         ON tt.id_tag = lptagrels.id_tag
       LEFT JOIN tags AS ttt
         ON ttt.id_tag = wtagrels.id_tag
WHERE  ( u.country = 'IE'
          OR u.country IN ( 'INT' ) )
       AND CASE
             WHEN ( ( tt.id_tag = lptagrels.id_tag )
                    AND ( lp.id_lp > 0 ) ) THEN lp.id_status = 1
                                                AND lp.published = 1
                                                AND lpcp.status = 1
                                                AND ( lpcp.country_code = 'IE'
                                                       OR lpcp.country_code IN (
                                                          'INT'
                                                          ) )
             ELSE 1
           END
       AND CASE
             WHEN ( ( ttt.id_tag = wtagrels.id_tag )
                    AND ( wc.id_wc > 0 ) ) THEN wc.wc_api_status = 1
                                                AND wc.wc_type = 0
                                                AND
             wc.class_date > '2010-06-01 22:00:56'
                                                AND wccp.status = 1
                                                AND ( wccp.country_code = 'IE'
                                                       OR wccp.country_code IN (
                                                          'INT'
                                                          ) )
             ELSE 1
           END
       AND CASE
             WHEN ( od.id_od > 0 ) THEN od.id_author = td.id_tutor
                                        AND o.order_status = 'paid'
                                        AND CASE
             WHEN ( od.id_wc > 0 ) THEN od.can_attend_class = 1
             ELSE 1
                                            END
             ELSE 1
           END
       AND ( t.tag LIKE "%Dictatorship%"
              OR t.tag LIKE "%democracy%"
              OR tt.tag LIKE "%Dictatorship%"
              OR tt.tag LIKE "%democracy%"
              OR ttt.tag LIKE "%Dictatorship%"
              OR ttt.tag LIKE "%democracy%" )
GROUP  BY td.id_tutor
HAVING key_1_total_matches = 1
       AND key_2_total_matches = 1
ORDER  BY tutor_popularity DESC,
          u.surname ASC,
          u.name ASC
LIMIT  0, 20  

The problem

The results returned by the above query are correct (AND logic working as per expectation), but the time taken by the query rises alarmingly for heavier data and for the current data I have it is like 10 seconds as against normal query timings of the order of 0.005 - 0.0002 seconds, which makes it totally unusable.

Somebody suggested in my previous question to do the following:-

  • create a temporary table and insert here all relevant data that might end up in the final result set
  • run several updates on this table, joining the required tables one at a time instead of all of them at the same time
  • finally perform a query on this temporary table to extract the end result

All this was done in a stored procedure, the end result has passed unit tests, and is blazing fast.

I have never worked with temporary tables till now. Only if I could get some hints, kind of schematic representations so that I can start with...

Is there something faulty with the query? What can be the reason behind 10+ seconds of execution time?

How tags work in this system?

  • When a tutor registers, tags are entered and tag relations are created with respect to tutor's details like name, surname etc.
  • When a Tutors create packs, again tags are entered and tag relations are created with respect to pack's details like pack name, description etc.
  • tag relations for tutors stored in tutors_tag_relations and those for packs stored in learning_packs_tag_relations. All individual tags are stored in tags table.

The explain query output:- Please see this screenshot -

http://www.test.examvillage.com/Explain_query_improved.jpg

+1  A: 

Temporary tables are not a silver bullet. The fundamental problem with your queries lies with patterns like this:

   t.tag LIKE "%Dictatorship%"
OR tt.tag LIKE "%Dictatorship%"
OR ttt.tag LIKE "%Dictatorship%"

Wildcarding the left side of a LIKE comparison guarantees that an index can not be used. Effectively, you're table scanning all three tables involved...

You need to leverage Full Text Searching, either MySQL's native FTS or 3rd party stuff like Sphinx. All the FTS I've known include a scoring/rank value indicating the strength of the match - you can read the MySQL documentation for the algorithm details. But the score/rank is not the same as what you've got: SUM(DISTINCT LIKE...), you could get the same using something like:

  SELECT t.id_tag, 
         COUNT(*) AS num_matches 
    FROM TABGS
   WHERE MATCH(tag) AGAINST ('Dictatorship')
GROUP BY t.id_tag
OMG Ponies
Your answer helped me in reducing the complexity of my query. So, a thumbs up to you. Now, prior to executing the main search query, I am collecting all the id_tags which match the search keywords. Then in the search query I am no more joining the tags table. Please check my new question and see if you can help. I am near to getting a solution - http://stackoverflow.com/questions/3030022/mysql-help-me-alter-this-search-query-to-get-desired-results
sandeepan