ansaurus

Question

MySQL MyISAM table performance... painfully, painfully slow

Answer 1

A:

One possible issue is that MySQL uses only one index per query and maybe you don't have a single index with those columns -- or MySQL's query optimizer isn't picking it. What does EXPLAIN SELECT &c tell you here?

Alex Martelli 2009-05-09 07:10:16

Answer 2

+1 A:

The easy way to figure out what MySQL is doing with your query is to have it explain the query to you. Run this and have a look at the output:

EXPLAIN SELECT userdata.column1, pagegroup.name
FROM userdata
INNER JOIN page USING( pageid )
INNER JOIN pagegroup USING( pagegroupid )

MySQL will tell you in which order it processes the queries and what indexes it uses. The fact that you created indexes does not mean that MySQL actually uses them.

See also Optimizing queries with EXPLAIN

EDIT

The output of your EXPLAIN looks fine. It does a full table scan on the userdata table, but that is normal since you want to return all rows in it. The best way to optimize this is to rethink your application. Do you really need to return all 372K rows?

Sander Marechal 2009-05-09 07:11:56

I've revised my question and added the result from explain command. Appears to be using correct indices but still 128 seconds just to execute.

Salman A 2009-05-09 07:39:16

I have updated my response as well.

Sander Marechal 2009-05-09 07:50:08

Answer 3

A:

I would start with breaking the query up, to figure out if there is one slow and one fast part, or if both are slow (sorry, I'm no fan of the USING syntax, so I'm going to use ON):

SELECT 
  u.userdata, p.pageid
FROM
  userdata u
  INNER JOIN page p ON u.pageid = p.pageid

SELECT 
  p.pageid, g.pagegroupid
FROM
  page 
  INNER JOIN pagegroup g ON p.pagegroupid = g.pagegroupid

What does that give you? Running these with EXPLAIN EXTENDED will provide additional hints.

Tomalak 2009-05-09 07:45:55

I've posted the output of the two queries. Explain extended returns similar queries in different syntax.

Salman A 2009-05-09 08:33:48

Looks like the second query is the trouble maker. Please include the info what indexes you have in place.

Tomalak 2009-05-09 09:31:48

primary keys + all keys used in joins are indexed. For the three tables I have indexes on pagegroup.pagegroupid (PK), page.pageid (PK), page.pagegroupid (INDEX), userdata.userdataid (PK), userdata.pageid (INDEX), userdata.column1 (INDEX)

Salman A 2009-05-09 09:58:08

Answer 4

A:

Looks like you're doing a join on all rows on userdata and then trying to select everything. That is every page in a pagegroup with userdata. Where's the WHERE clause? There's no LIMIT, how many results did you want? Why don't you get your row count down on userdata row in your explain result, that should speed up the query. Heh.

apphacker 2009-05-09 07:53:40

I need a dump of selected columns from userdata along with pagegroup.name for cross reference. I believe it should work fast enough if there was no "mediumtext" column in the page table.

Salman A 2009-05-09 08:17:51

Maybe you want to start recording this information in a log as it comes in instead of using SQL, maybe think about something other than SQL for this data, like non-normalized berkeley db or something.

apphacker 2009-05-09 10:26:21

Answer 5

+2 A:

What's the data type and purpose of columnX in the userdata table? It should be noted that any text data type (i.e excluding char, varchar) forces any temporary tables to be created on disk. Now since you're doing a straight join without conditions, grouping or ordering, it probably won't need any temporary tables, except for aggregating the final result.

I think it would also be very helpful if you show us how your indexes are created. One thing to remember is that while InnoDB concatenates the primary key of the table to each index, MyISAM does not. This means that if you index column name and search for it with LIKE, but still want to get the id of the page group; Then the query would still need to visit the table to get the id instead of being able to retrieve it from the index.

What this means, in your case, if I understand your comment to apphacker correctly, is to get the name of each users pagegroups. The query optimizer would want to use the index for the join, but for each result it would also need to visit the table to retrieve the page group name. If your datatype on name is not bigger than a moderate varchar, i.e. no text, you could also create an index (id, name) which would enable the query to fetch the name directly from the index.

As a final try, you point out that the whole query would probably be faster if the mediumtext was not in the page table.

This column is excluded from the query you are running I presume?
You could also try to separate the page data from the page "configuration", i.e. which group it belongs to. You'd then probably have something like:
- Pages
  - pageId
  - pageGroupId
- PageData
  - pageId
  - data

This would hopefully enable you to join quicker since no column in Pages take up much space. Then, when you needed to display a certain page, you join with the PageData table on the pageId-column to fetch the data needed to display a particular page.

PatrikAkerstrand 2009-05-09 08:53:36

Answer: #1 - Yes, i am not SELECTing the data column; #2 - Yes thats a workaround which "should" work, another possibility is to de-normalize the table a tad bit and add pagegroupid into userdata but the question is if there is something wrong in the table structure or the query.

Salman A 2009-05-09 09:02:32

most of the columns are varchar 100s including pagesource.name and userdata.field2

Salman A 2009-05-09 09:09:07

It worked after I moved the "data" column into a separate table "pagetemp" that relates 1-to-1 with the page table. None of the indexes work otherwise.

Salman A 2009-05-09 10:40:34

Answer 6

+1 A:

I'm assuming the userdata table is very large and does not fit in memory. MySQL would have to read the entire table from harddisk, even if it needs only two small columns.

You can try to eliminate the need for scanning the entire table by defining an index that contains everything the query needs. That way, the index is not a way to facilitate a search into the main table, but it's a shorthand version of the table itself. MySQL only has to read the shorthand table from disk.

The index could look like this:

column1, pageid

This has to be non-clustered, or it would be part of the big table, defeating its purpose. See this page for an idea on how MySQL decides which index to cluster. The easiest way seems to make sure you have a primary key on pageid, which will be clustered, so the secondary column1+pageid index will be non-clustered.

Andomar 2009-05-09 09:08:22

I tried creating a two column index (pageid-sourceid) on the page table in order to create a short-circuit between userdata and pagesource which reduced execution time but not much.

Salman A 2009-05-09 10:47:44

Your comment above says performance got better with a 1:1 table. That really seems to suggest the index on (pageid,sourceid) was clustered in some way. Oh well... problem solved I guess.

Andomar 2009-05-10 00:14:46

ansaurus

tags:

views:

answers:

MySQL MyISAM table performance... painfully, painfully slow

Edit #1

Edit #2

Moral of the story

related questions