views:

85

answers:

8

Hi there,

So i have a query that looks like this:

SELECT col1, col2, col3 ...
FROM action_6_members m
LEFT JOIN action_6_5pts f ON f.member_id = m.id
LEFT JOIN action_6_10pts t ON t.member_id = m.id
LEFT JOIN action_6_weekly w ON w.member_id = m.id
WHERE `draw_id` = '1' ORDER BY m.id DESC LIMIT 0, 20;

now this is doing a massive join (3.5 million * 40 thousand * 20 thousand)

so my idea was to:

do SELECT * FROM action_6_members WHEREdraw_id= '1' ORDER BY id DESC LIMIT 0, 20;

then loop over that using php to build $in = "IN(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)";

then run
select * from action_6_5pts where member_id in $in
select * from action_6_10pts where member_id in $in
select * from action_6_weekly where member_id in $in

then smush them all together using php,

this means that, although i am using four different queries i am only selecting 20 rows from each, instead of doing the join on all.

Will i notice a significant performance bonus?


Update
So, the general consensus is, 'DONT DO THAT!'

here is the general overview of the app

it receives a code,

the code is either a 5pt, 10pt, or a weekly code,

all three code types are in seperate tables. the three tables have code, and member_id

the member_id links to the id in the action_6_members table.

when a code is claimed the data is filled in in the action_6_members table.

the id of that member is then filled in in the table for the code that was claimed.

the above query selects the first twenty members.

So my question is then.

What can I do to improve this?

as currently everything is timing out before the queries complete.

action_6_members

CREATE TABLE `action_6_members` (
  `id` int(11) NOT NULL auto_increment,
  `draw_id` int(11) NOT NULL,
  `mobile` varchar(255) NOT NULL,
  `fly_buys` varchar(255) NOT NULL,
  `signup_date` datetime NOT NULL,
  `club` int(11) NOT NULL default '0' COMMENT '1 = yes, 2 = no',
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=1337 DEFAULT CHARSET=latin1

action_6_ 5 and 10pts

CREATE TABLE `action_6_5pts` (
  `code` varchar(255) NOT NULL,
  `member_id` int(11) NOT NULL,
  PRIMARY KEY  (`code`),
  KEY `member_id` (`member_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

action_6_weekly

CREATE TABLE `action_6_weekly` (
  `id` int(11) NOT NULL auto_increment,
  `code` varchar(255) NOT NULL,
  `member_id` int(11) NOT NULL,
  PRIMARY KEY  (`id`),
  UNIQUE KEY `id` (`id`),
  KEY `member_id` (`member_id`)
) ENGINE=MyISAM AUTO_INCREMENT=3250001 DEFAULT CHARSET=latin1


Update 2: explain the query

id select_type table type possible_keys key       key_len ref  rows   Extra  
1  SIMPLE      m     ALL  \N            \N        \N      \N   1390   Using temporary; Using filesort  
1  SIMPLE      f     ALL  member_id     \N      \N      \N   36000  
1  SIMPLE      t     ALL  member_id     \N      \N      \N   18000  Using where  
1  SIMPLE      w     ref  member_id     member_id 4    m.id 525820 Using where  

Just had this through: Latest load data from DB 7.26, 4.60, 2.45

1.0 is the normal max load...Anything above means it's had to 'burst' and call on additional processes to handle. i.e. 7.26 means load is 7 x the max of the blade server and has had to call on others to help out

so currently this query is more than a monster, its eating monsters as snacks...

+1  A: 

Multiple trips, back & forth, between the application and the database? No, that won't provide a performance bonus compared to a single query.

OMG Ponies
It depends on the query. Using LEFT JOIN is a horrible performance hit. In this case the 4 queries are probably faster.
Cfreak
OMG Ponies
And for his specific case, given the number of rows he's got, I disagree. In general you are correct, and certainly correct if we were talking about an INNER join. But with LEFT JOIN the database engine is going to search through an exponential number of rows. Getting the few rows he actually needs off an indexed column is almost certain to be faster. However I did put in my answer if he could not change the table structure (Which would probably be the best), or to an INNER JOIN then he could UNION those 4 queries together to avoid trips back tot he application.
Cfreak
A: 

don't do that. the database is very fast in joining tables and select the relevant rows - much faster as if you do a lot of single queries.

oezi
Not in this case. LEFT JOIN is much slower than the queries he proposes.
Cfreak
+1  A: 

You don't have to use PHP to do this, you can do it in one query with subqueries, or a multi-query stored procedure.

To see which one is faster, benchmark them.

Borealid
mind pointing me to a tutorial on stored procedures?
Hailwood
+7  A: 

As a general rule, if your SQL query can fully model what you want to do, then it is likely to be faster than splitting it up into pieces glued together in PHP (or any other language), within certain bounds.

Those bounds are:

  1. There must not be strange pathological behavior hiding in MySQL for this case.
  2. You must have reasonable indexes on all the necessary columns.
  3. There is no (or no likely) case which you can only reasonably detect/handle in PHP in which you would want to abort the query mid-way through.
  4. Your result set is not pathologically huge (e.g. it fits in memory and doesn't exceed the size of max_allowed_packet in my.cnf).

Now, this doesn't address whether your SQL (or proposed alternative implementation in PHP) is optimal for what you're doing, but that can only be addressed given more information about what your app does and the end point you're actually trying to reach. It might be fine, it might not.


In a quick glance at your update with the table structure, nothing jumps out at me as the likely cause of a large performance problem, but:

  • Don't use MyISAM unless you've established you need it. InnoDB is your friend, especially if the tables have a decent amount of write traffic. MyISAM's full-table locks can really bite you. Having FOREIGN KEYS for referential integrity would also be nice.
  • action_6_weekly has id as the PRIMARY KEY, and a UNIQUE KEY on... id. This is redundant. PRIMARY KEY is effectively a superset of UNIQUE KEY, you don't need to create a separate UNIQUE KEY.
  • The output of EXPLAIN on your query would be interesting.
Nicholas Knight
+1 For InnoDB, you should use InnoDB and define foreign key constraints or at least add indexes to your tables, this will speed up your queries.
wimvds
A: 

You are not going to know if or how much of a performance gain this approach will give you until you try it out. In my experience, changing this sort of query to discrete ones is not something you can predict. What you are looking for is a tipping point in MySQL where creating internal tables past a certain size is a killer. Once you know where that point is in your installation, then you can play games with query splicing and post-processing.

staticsan
A: 

You should be using the in clause with the Join, instead of using the Limit. The Limit is running after the join, instead of being part of the query.

Benjamin Anderson
how so? remembering that the limit changes eg, 0, 20 21,40 41,60
Hailwood
Limit is essentially a cursor, meaning it happens after the join is done. It doesn't limit the join, it limits the selection. You should have a subquery or other clause to restrict the join.
Benjamin Anderson
In other words, you four individual queries will not provide the same results, since your where clauses won't the the same. The individual queries, in this cause will be faster because your restricting the domain of the results. You're join doesn't have any restrictions, you're selecting all results from every table in the join, then ordering it, then restriction the set with the Limit.
Benjamin Anderson
+1  A: 

Oddly enough I'm going to disagree with the consensus, well at least in part.

First of all you should never ever use LEFT JOIN. It's tempting but it's almost always a bad idea. I'm assuming in your case that the action_6_5pts, action_6_10pts and action_weekly tables may not contain all member ids. (I'm guessing at your data, so if each table is guaranteed to contain all member ids then take LEFT out of your query and you should be fine.)

I suspect there's probably a better way you could lay out your data in the first place. Generally it's a good idea to combine the same types of data into a single table. I don't want to take guesses at your data so I'll give you a pseudo example. I've seen lots of times where people take similar data and split it up across multiple tables (smaller tables are better right?). Not always. For example if you were building an invoice system it might be tempting to think of splitting up the invoices by months into separate tables. So you create invoice_Jan2010, invoice_Feb2010 ... etc. But what if you want to search? The same customer probably isn't in all the months so it's hard to get a list with just that customer, without using LEFT JOIN. Yuck. We don't like LEFT JOIN! It's slow!

The better way to approach it is to have a single invoice table with a date (indexed!) and each customer ID. Any JOINs are guaranteed to find the invoice unless none exist for the customer (which wouldn't matter)

Now in your case maybe you could make 5pts and 10pts flags in a single table and weekly is date? I'm making assumptions, without knowing more it's hard to give you the "right" answer.

Now I said I disagree with the consensus. If you don't change your data, generally if you have a very large table like you say, splitting into 4 queries using IN statements is a better idea than LEFT JOIN. If you want to speed it up you can combine all 4 into 1 using UNION. That should still be faster than LEFT JOIN.

You can also prove it easily. Take your query and put the keyword EXPLAIN in front of it and execute it on Mysql directly (using one of the tools: command line, Mysql GUI or even phpmyadmin). That will give you a breakdown of how it plans to JOIN the tables together.

The explanation of the output is too long for this answer but in general each row of output will tell you how many rows the query will join. Smaller the better. It will also tell you how it's going to JOIN. "Using temporary" or "Using filesort" is what you want to avoid if possible (though if you sort it will come up so be prepared). There's also a column for what key the rows will be joined using. If that column is empty you should try to create an index to make it work better.

Hope that helps! Good luck!

Cfreak
basically the 3 tables contain every single possible code, so, if the code has not been claimed then the member_id is 0;
Hailwood
I don't understand why a LEFT JOIN is inherently bad. Isn't it the same thing as a RIGHT JOIN, just with the table order reversed? Or are you saying that you should avoid using any OUTER JOIN?
Lèse majesté
Yes. OP used LEFT JOIN in his example but it would apply to any OUTER JOIN, especially on very large tables. The problem (at least in MySQL) is that the database has to search through all the rows in the joined tables to ensure there are no matches. An INNER join on an indexed column can quickly get the rows that actually match.
Cfreak
A: 

I may be going mad, but I can't see an index on the table action_6_members for the field you are filtering on draw_id in your original query.

This means the query is going to have to scan through all the data in the action_6_members table and then join to the others.

Adding an index to the draw_id column will probably help here.

You could create a combined key of (draw_id,id), but that probably wouldn't gain you much unless you weren't pulling any data from the action_6_members table (if you're not, then the multi-field index can be used instead of reading through the data table)

Hope that helps...

Dave Rix