tags:

views:

405

answers:

7

I have to JOIN to large tables in a MySQL query and it takes really long - approximately 180 seconds. Are there any tips for optimizing a merge?

My table has 10 fields. I am only using 4 in the query - all strings. Table has about 600,000 rows and the result should have about 50 rows.

The four used rows are: Title, Variables, Location, Date

Here is my query:

SELECT DISTINCT t1.Title, t1.Variables FROM `MyTABLE` t1 JOIN `MyTABLE` t2  
USING (Title,  Variables) 
WHERE (t1.Location, t1.Date) = ('Location1', 'Date1') 
AND (t2.Location, t2.Date) = ('Location2', 'Date2')
+2  A: 

Yes. Create appropriate indexes based upon the queries being run against the tables involved.

Mitch Wheat
In my query, I already have indexes on the fields used in WHERE logic. Is there something beyond that I can do?
Brian
Read the output of `EXPLAIN` and add indexes based on that.
Lukáš Lalinský
+1  A: 

Make sure the fields you are matching on are indexed. Matching numeric values is also a quicker than strings.

But wouldn't it be simpler to just write

SELECT DISTINCT 
  Title, 
  Variables 
FROM `MyTABLE`
WHERE 
  Location = 'Location1' AND Date = 'Date1' 
  OR
  Location = 'Location2' AND Date = 'Date2'
Benoit Vidis
almost...you have to ensure there are two rows coming back...but indeed I oversaw that. Reffng you in my anser.
Roland Bouman
+1  A: 

Can you prepend your SQL statement with "EXPLAIN" and then re-run it, it's likely because of missing indices on the columns you're joining on.

Also trying using STRAIGHT_JOIN and mention the table that is slower in size in the left, and the bigger one on the right to hint MySQL to choose the first table.

mjsabby
Also, see which table is smaller (this is being naive, but still) and then use a STRAIGHT_JOIN to tell MySQL to read the tables in order (left to right)For example:EXPLAIN SELECT tb1.X FROM tb2 STRAIGHT_JOIN tb1 WHERE ....Also by default an INNER JOIN (which is a cartesian product) takes place, which may be what you want, but you could potentially see if you could have an OUTER JOIN
mjsabby
mjsabby, he is using a self JOIN. by definition, they are exactly the same size. As for INNER JOIN being a cartesian product: this is nonsense. If you have a proper index that can be used to resolve the join operation (in this case, one having (Title, Variables)) MySQL will certainly not compute a cartesian product, but it will use a nested loop join. Finally, an outer join would most likely make things worse if anything at all.
Roland Bouman
I missed the JOIN there.
mjsabby
A: 

Without the description of the tables and the query there's little we can do to help.

There are several things that can determine the speed of a join.

  • The database engine: Are you using InnoDB or MyISAM? Or maybe any other engine? Some are faster at lookups than others, which affects joins.
  • Indexes: Are the appropriate match-columns indexed?
  • Partition indexes: Maybe you can partition the table by indexes to make it even faster?

Also, look at EXPLAIN query which will look at all the steps that mysql takes to execute it. It could help you tremendously.

Tor Valamo
A: 

Try to use composite index on columns in where clause and try to put all other columns in select in Included Columns, this will save traditional look up cost.

Nitin Midha
+4  A: 

Like others pointed out, you need proper indexes. For this particular query, you can benefit from indexes like:

(Location, Date) or (Date, Location) (for the WHERE clause) and (Title, Variables) or (Variables, Title) (for the join condition, ON clause)

It would be helpful to know exactly the size (that is, datatype) of the location, Date, Title, and Variables columns, as a large index is likely to be slower than a small one.

Finally, just a tip: I would not use fancy comparison constructs like you do. The

USING (Title,  Variables) 

is probably ok, but I would certainly check if

(t1.Location, t1.Date) = ('Location1', 'Date1') 

and

(t2.Location, t2.Forecast_date) = ('Location2', 'Date2')

are behaving like you expect. SO I would definitely run EXPLAIN on it, and compare the output with a "regular" old fashioned comparison, like so:

    t1.Location      = 'Location1'
AND t1.Date          = 'Date1'
AND t2.Location      = 'Location2'
AND t2.Forecast_date = 'Date2'

You may argue that logically, it is the same and it shouldn't matter - you'd be right. But then again, MySQL's optimizer isn't very smart, and there is always a possibility of bugs, especially with features that aren't used a lot. I think this is such a feature. So i would at least try to EXPLAIN and see if these alternate notations are evaluated the same.

But what BenoKrapo pointed out, would it not be easier to do something like this:

SELECT Title, Variables 
FROM   MyTABLE
WHERE  Location = 'Location1' AND Date = 'Date1' 
OR     Location = 'Location2' AND Date = 'Date2'
GROUP BY Title, Variables
HAVING COUNT(*) >= 2

EDIT: I changed HAVING COUNT(*) = 2 to HAVING COUNT(*) >= 2. See comments (thanks again, BenoKrapo)

EDIT: days after posting this answer, I found this post from Mark Callaghan, MySQL Architect for Facebook: http://www.facebook.com/note.php?note_id=243134480932 Essentially, he describes how similar-but-different 'smart' comparisons deliver abysmal performance due to MySQL optimizer bug. So my point is, try to unfancy your syntax when you suffer, you might have hit a bug.

Roland Bouman
Thanks for the quote. Indeed, I had missed the cardinality constraint coming from the join. But the Having count(*) should be greater or equal to 2, not equal to.
Benoit Vidis
BenoKrapo: yup - you're right. Corrected that, thanks!
Roland Bouman
A: 

This might be cheating a little bit, but I actually found it easier to JOIN the two queries together in PHP after the query. This only works because I am selecting two distinct variables.

$query = "SELECT DISTINCT Title, Variables FROM 
MyTABLE WHERE Location='Location1' AND Variable='Variable1'";

$result = mysql_result($query);
while ($row = mysql_array_assoc($result)) {
    $Title = $row['Title'];
    $Variables = $row['Variables'];
    $Array_result1[$Title] = $Variables;
}


$query = "SELECT DISTINCT Title, Variables FROM 
MyTABLE WHERE Location='Location2' AND Variable='Variable2'";

$result = mysql_result($query);
while ($row = mysql_array_assoc($result)) {
    $Title = $row['Title'];
    $Variables = $row['Variables'];
    $Array_result2[$Title] = $Variables;
}

$Array_result = array_intersect($Array_result1, $Array_result2);

I Liked the idea of only using one MySQL query to merge the two queries, but this is so much faster.

Brian