views:

86

answers:

5

I have a search implemented on my site, it runs the following queries:

SELECT COUNT(mov_id) AS total_things 
FROM content
    WHERE con_status = 1 AND con_incomplete = 0 AND con_type = 1 
            AND ((con_title) LIKE ('%search keyword%') 
            OR soundex(con_title) LIKE soundex('search keyword')
            OR MATCH (con_title) AGAINST ('search keyword'));

+----+-------------+--------+------+---------------+----------+---------+-------------------+-------+-------------+
| id | select_type | table  | type | possible_keys | key      | key_len | ref               | rows  | Extra       |
+----+-------------+--------+------+---------------+----------+---------+-------------------+-------+-------------+
|  1 | SIMPLE      | movies | ref  | con_type      | con_type | 12      | const,const,const | 11804 | Using where |
+----+-------------+--------+------+---------------+----------+---------+-------------------+-------+-------------+

64058 Queries

Total time: 200817, Average time: 3.13492459958163

Taking 2 to 25 seconds to complete

Rows analyzed 1882 - 12104


SELECT 
            con_id, 
            con_title, 
            con_desc,
            MATCH (con_title) AGAINST ('search keyword') AS relevancy 
    FROM content
    WHERE con_status = 1 AND con_incomplete = 0 AND con_type = 1 
            AND ((con_title) LIKE ('%search keyword%') 
            OR soundex(con_title) LIKE soundex('search keyword')
            OR MATCH (con_title) AGAINST ('search keyword')) 
    ORDER BY relevancy DESC 
    LIMIT 0, 24;

+----+-------------+--------+------+---------------+----------+---------+-------------------+-------+-----------------------------+
| id | select_type | table  | type | possible_keys | key      | key_len | ref               | rows  | Extra                       |
+----+-------------+--------+------+---------------+----------+---------+-------------------+-------+-----------------------------+
|  1 | SIMPLE      | movies | ref  | con_type      | con_type | 12      | const,const,const | 11803 | Using where; Using filesort |
+----+-------------+--------+------+---------------+----------+---------+-------------------+-------+-----------------------------+

78321 Queries

Total time: 200657, Average time: 2.56198209930925

Taking 2 to 16 seconds to complete

Rows analyzed 0 - 15752

This basically works like a ghetto "fuzzy search" to ignore typos people might make.

Unfortunately, its very slow (even if I remove soundex() or FULLTEXT searching. How to improve search speeds in this situation?

+2  A: 

The part of the WHERE clause that hurts is the first % after LIKE. To speed it up, you could normalize the keywords, moving them to a separate table:

table moviekeywords: movieid, keyword
table movies:        movieid, ...

This allows you to search through the moviekeywords table using an = condition, or at least like 'humphrey%'. Both variants can be made expremely fast with an index.

Andomar
Im searching titles, not keywords. SO having it in a separate table would be silly.
Yegor
Dang. Was going to suggest that. +1 for TFGITW.
DVK
Yegor - does not matter. You are not likiely to search from 3rd character of a word.
DVK
Yes, but if I have stuff like "the terminator" in the DB, and someone searches for "terminator" it wont be picked up.
Yegor
@Yegor: The keywords table is supposed to contain one row for each word. For "The Terminator", it would contain two rows, "The" and "Terminator"
Andomar
A: 

As long as you keep using soundex and LIKE(%nnn%) you will be running a full scan of all of an intermediate result. To illustrate this: If you omitted your other predicates (on con_status, con_incomplete AND con_type columns) you would always be running a full table scan.

I suggest dropping or scaling back your fuzzy predicates. For example, just running LIKE('nnn%') will be MUCH faster than %nnn% (if that column is indexed) but of course your search results will not be as fuzzy. Perhaps make soundex an advanced search option that does not always run.

If you can't compromise on any of those issues then at least make sure that your con_status, con_incomplete AND con_type columns are all indexed.

Paul Sasik
A: 

I made my "fuzzy search" a fallback option if COUNT on the original stricter query returns no results. My results have been pretty fast so far using

 SOUNDS LIKE ('blah')
Greg
A: 

Think about Andomar's solution again - most keyword searches allow you to specify multiple keywords. You can't do that with your current query. And there's no problem with "The Terminator" - for that, you'd just add one keyword, "Terminator".

And with an index on the keyword column, it will be fast.

blackanchorage
A: 

So it looks like you only have around 15,000 rows. If you don't expect your table to grow past a hundred thousand entries or so, maybe you should just keep all the titles in memory and avoid hitting the database until you know which entries you want.

That is, at startup and at periodic intervals, just query all the titles out of the database, split each one into words, and keep a mapping of words to row keys. This should take less than 1MB of memory, accessing it should be quite fast, and most importantly you can add whatever fuzzy matching or heuristic scoring mechanisms you like (without modifying your schema).

Just a thought.

Jason Orendorff
Would I keep thus in a memory table? Im not sure what you mean by "keep in memory"
Yegor