tags:

views:

90

answers:

5

How can I optimize this mysql query:

SELECT * 
FROM messages 
WHERE toid='".$mid."' 
  AND fromid='".$fid."' 
  OR (toid='".$fid."' AND fromid='".$mid."')  
  AND subject != 'something' 
order by id ASC LIMIT 0,5

I need to get messages between users in one page. This query id taking to much time and server resources. Can it be done in some other way?

Thanks.

+1  A: 

As zerkms points out, you can't do an index for the subject in this case. Check his suggestion in the comments.

Add one index for both ids:

CREATE INDEX ids_index ON messages (fromid, toid);

and split the query into two.

Yar
subject part of index will not work due to != used. and OR can be the reason why with any indexes it will be fullscan here. my proposal is to split this query to 2: with oid='".$mid."' AND fromid='".$fid."' and toid='".$fid."' AND fromid='".$mid."'
zerkms
and of course distinct indexes (instead of one combined) will not work too.
zerkms
if mysql optimizer cannot use "index merge" optimization then it will fall into fullscan. looking to the query i think that it's not the case when index merge will be applied (more detailed here: http://dev.mysql.com/doc/refman/5.1/en/index-merge-optimization.html). anyway, the best way to get the answer is to try and pray that merge will be used.
zerkms
@zerkms, on the empirical "try and pray" I've found that MySql does surprising things with indexes that are non-intuitive. Anyway, I'll leave my answer as is for now, and hopefully with your comments it's helpful to somebody at some point.
Yar
+1  A: 

Add an index on the pair (toid,fromid). That will allow it to use the index to find the relevant messages and speed up your query. Note that indexes on the individual columns would potentially still leave a lot of messages to be scanned, i.e., those messages to/from one of the individuals to someone else. Using the pair will limit the messages found by the index scan to just those between the two individuals.

tvanfosson
i bet that OR part will turn off any index optimizations.
zerkms
i agree with zerkms. your query will be better off using union than using an or
uji
I don't know about MySQL, but creating the index on a test DB in SQL Server took my query over 100K rows with 12 matches from 26 seconds to less than 1 second. The query plan used two index seeks, a merge join, then an inner join against a key lookup to grab the subject. It basically used the same plan for an index on all three fields so I don't think (for SQL Server anyway) the additional size of including the subject in the index was worth it.
tvanfosson
+2  A: 

As written, the query is possibly betraying the actual intent. It seem likely that the condition desired is

WHERE ( ( toid='".$mid."' AND fromid='".$fid."' )
   OR   (toid='".$fid."'  AND fromid='".$mid."') 
      )
  AND subject != 'something' 

But, as written, the query will only apply the subject condition with the second (toid and fromid) clause.
Note that in the above, the inner parenthesis are extraneous; never the less it is often a good idea to include them to show the intended expression more explicitly.

In either case, this query is a "hard[er]" query to resolve, owing to the OR clause and to a NOT EQUAL predicate. The OR clause typically causes the server to merge the results from two subquery (although other strategies are possible). The NOT EQUAL predicate cannot be resolved by an index lookup (however a covering index does help, in some cases), for it saves the trip to the main table / other indexes for assessing whether the row at hand satisfies the predicate)

Independently from this possible logical problem, adding indexes will multiple keys would help the situation. I'd like to suggest the following:

  • toid, fromId, subject
  • toid, fromid

The interest of the index that also includes the subject is to allow the query to be resolved with a partial scan of the index rather than having to lookup the subject. This index would be used as a covering index for this query.

Beware however that adding indexes decreases performances for INSERT, UPDATE and DELETE operations.

Edit: on the usability of the (toid,fromid, subject) index
First off, it is acknowledged that we need only one of the suggested indexes, i.e. if we have the (toid, fromid, subject) index, the (toid, fromid) one would be redundant (albeit possibly more efficient if subject was a relatively long column).
This said, the fact that the query uses a NOT EQUAL predicate on subject doesn't necessary exclude the use of the subject data in the (toid, fromid, subject) index. The reason for that is that the [not equal] condition on subject can be resolved within the index (not requiring a match/merge or a lookup, i.e. akin to some "covering" logic)

mjv
second one is absolutely redundant, because it included with left-most part of first index.
zerkms
"the subject is to allow the query to be resolved with a partial scan" - wrong. it's impossible to use != comparison with B-Tree, just because it is not range comparison.
zerkms
@zerkms you are correct, I should make it more explicit that only one of these index would be necessary (and in which condition). I'll edit accordingly.
mjv
hehe, actually in our case the second (shorter) index is preferred, just because "subject" part is useless...
zerkms
@zerkms:on the partial scan, I beg to differ. A plausible query plan would be to [partial] scan the (toid, fromid, subject) index for all values of [mid, fid, *] and to only take the entries where the subject doesn't match 'something' (and to OR this with a similar query but for [fid, mid, *])
mjv
subject part with != never be used :-) so the last part is useless. and there is no any application for last part of index.
zerkms
+1  A: 

Given that you hopefully have some logic to prevent a user sending a message to herself, perhaps this:

WHERE toid IN ($mid, $fid) AND fromid IN ($mid, $fid) AND subject <> 'something'

Put an index on (toid, fromid) and that should be pretty ok, i think.

nickf
A: 

Thanks! Index on the pair (toid,fromid) is the solution.

Sergio