views:

76

answers:

2

Hi,

I have the following scenario: In a MySQL database, I have 2 MyISAM tables, one with 4.2 million rows, and another with 320 million rows. The following is the schema for the tables:

Table1 (4.2M rows)

F1 INTEGER UNSIGNED NOT NULL PRIMARY KEY
f2 varchar(40)
f3 varchar(40)
f4 varchar(40)
f5 varchar(40)
f6 smallint(6)
f7 smallint(6)
f8 varchar(40)
f9 varchar(40)
f10 smallint(6)
f11 varchar(10)
f12 tinyint(4)
f13 smallint(6)
f14 text

Table2 (320M rows)

F1 INTEGER UNSIGNED NOT NULL PRIMARY KEY
f2 INTEGER UNSIGNED NOT NULL

Table2 is in a different database but I am using a stored procedure which queries the two tables. The relation between the two tables is that for Table1.F1 there may be up to approx. 100 rows in Table2.F1 (foreign key) which match, and the value for Table2.f2 will be returned for these matched keys. I have an index IX1(f2(15),f3(10)) on Table1 and an index IX2(F1,f2) and IX3(f2) in Table 2

The queries I am running are the following:

SELECT g.F1
FROM DB1.Table1 g 
INNER JOIN DB2.Table2 gp ON g.F1 = gp.F1 
WHERE (gp.f2 = 452677825) AND
(g.f2 = 'A string value') LIMIT 0,56

This query is sometimes very fast (<1s) but changing the string value that g.F2 is compared to leads to queries which take even over 11 and sometimes even 30 seconds. I cannot understand why this is so. The following is the output of the EXPLAIN on the SELECT that is executed.

1, 'SIMPLE', 'g', 'ref', 'PRIMARY,IX1', 'IX1', '17', 'const', 901, 'Using where'
1, 'SIMPLE', 'gp', 'ref', 'IX3,IX2', 'IX2', '8', 'DB1.g.F1,const', 1, 'Using index'

which seems to be quite a good execution plan. The number of rows in the top row of the explain goes to 2000 at most, but I do not see why this should take any longer than a fraction of a second to return results. I also ran profiler on the query and noticed that the queries are spending 99.9% of the time on the "Sending data" stage. Can anyone please explain why this is so, and what can be done to optimise the query?

Thanks in advance, Tim

+1  A: 

I'm not an expert in this area, but here are a few thoughts:

Query speed taking longer when g.F2 changes is because of caching. MySQL will save the results for each query (until the cache is full), but new queries are run on an empty cache, so they take longer. You shouldn't optimize based on this. (See How to measure accurately)

I can't tell from your information whether the g or gp table has greater specificity (seems like gp?) in the where clause, but you may want to try a subquery instead. (See How to force the inner query to execute first)

Regarding profiling, it's possible you're hitting a physical threshold like exceeding ram allocation (using swap is disastrous for performance) that would not be obvious from explain, or whether explain is just wrong in this case.

banzaimonkey
Hi,Thanks for your comment. Right now I have changed the above query to use IN instead of a join, as you suggested. The query now looks as follows:SELECT g.F1 FROM (SELECT g.F1 FROM DB1.Table1 g WHERE (g.f2 = 'abc')) AS A WHERE A.F1 IN (SELECT gp.F1 FROM DB2.Table2 gp WHERE (gp.f2 = 452677825)) LIMIT 0,56and the query runs a lot faster (~1s, 2s max). My mission is to try and reduce this even further if possible!
Tim
A: 

If you are able to you may want to try tweaking your my.cnf, the property you want to play with is key_buffer_size. MyISAM indexes are stored in .MYI files if you locate these and total up the file sizes (e.g. ls -lh /var/lib/mysql/dbname/*.MYI) you can roughly estimate how big the key buffer needs to be to fit all of your indexes in. The MySQL docs does recommend not to exceed 25% of system memory though.

ejrowley