ansaurus

Question

Answer 1

A:

I'll admit that this is a bit of a guess, but I'll give it a shot.

You have id -- the first field -- as the primary key. I'm not 100% sure how MySQL does clustered indexes as far as lookups, but it is reasonable to suspect that, for any given ID, there is some "pointer" to the record with that ID.

It is relatively easy to find the beginnings of fields when all prior fields have fixed width. All your BIGINT(20) fields have a defined size that makes it easy for the db engine to find the field given a pointer to the start of the record; it's a simple calculation. Likewise, the start of the first VARCHAR(255) field is easy to find. After that, though, because the fields are VARCHAR fields, the db engine must take the data into account to find the start of the next field, which is much slower than simply calculating where that field should be. So, for any fields after txtProperty1, you will have this issue.

What would happen if you changed all the VARCHAR(255) fields to CHAR(255) fields? It is very possible that your query will be much faster, albeit at the cost of using the maximum storage for each CHAR(255) field regardless of the data it actually contains.

Andrew 2010-10-01 16:40:45

Sorry, no dice. Changing the first 5 properties to CHAR(255) actually made the query run in 98s query, 1.5s fetch. However, testing in this area has lead me to another odd discovery: selecting on txtProperty8 suffers the same penalty as txtProperty1 (only 2 seconds). txtProperty7 is somewhere in between (around 5 seconds). This whole thing is very, very strange.

Monkey Boson 2010-10-01 16:54:13

Answer 2

A:

Fragmented tablespace? Try a null alter table:

ALTER TABLE tbl_name ENGINE=INNODB

igelkott 2010-10-01 19:37:52

Sorry.. it didn't work. Does this accomplish the same thing as optimize, in InnoDB?

Monkey Boson 2010-10-01 21:53:24

I don't think so. Still think there might be some sort of tablespace error to explain the dramatic differences between nearly identical columns.

igelkott 2010-10-01 21:58:50

Answer 3

+1 A:

The MySQL documentation for the InnoDB engine suggests that if your varchar data doesn't fit on the page (i.e. the node of the b-tree structure), then the information will be referenced on overflow pages. So on your wide Warehouse table, it may be that txtProperty1 is on-page and txtProperty2 is off-page, thus requiring additional I/O to retrieve.

Not too sure as to why the SELECT * is better; it may be able to take advantage of reading data sequentially, rather than picking its way around the disk.

richaux 2010-10-02 19:22:10

This scenario is entirely possible given my data. I'm a little surprise at the 2s -> 24s increase in retrieval time, though. Any ideas on how I can ameliorate the query time?

Monkey Boson 2010-10-07 20:29:28

I don't have any practical experience of this: there appear to be 2 potential ways of getting more data on-page. a) You could try making the page wider: by setting the KEY_BLOCK_SIZE, or b) do you have any flexibility around datatype sizes, e.g. do you need the numerics to be BIGINT (would an unsigned INT or MEDIUMINT do?), and/or can the VARCHARs be just 100 length?

richaux 2010-10-08 09:32:56

Looks like `SHOW TABLE STATUS` will show you the current `KEY_BLOCK_SIZE`. What is that value? And what to the column sizes up to txtProperty1 add up to?

Harold L 2010-10-11 22:10:06

Answer 4

A:

Since I am a SQL Server user and not a MySQL guy, this is a long shot. In SQL Server the clustered index is the table. All the table data is stored in the clustered index. Additional indexes store redundant copies of the indexed data sorted in the appropriate sort order.

My reasoning is this. As you add more and more data to the query, the fetch time remains negligible. I presume this is because you are fetching all the data from the clustered index during the query phase and there is effectively nothing left to do during the fetch phase.

The reason the SELECT * works the way it does is because your table is so wide. As long as you are just requesting the key and one or two additional columns, it is best to just get everything during the query. Once you ask for everything, it becomes cheaper to segregate the fetching between the two phases. I am guessing that if you add columns to your query one at a time, you will discover the boundary where the query analyzer switches from doing all of the fetching in the query phase to doing most of the fetching in the fetching phase.

J Edward Ellis 2010-10-02 20:22:48

This sounds like the "covering index" technique mentioned by a few others. Is it still the case if neither txtProperty1 nor txtProperty2 are part of any index?

Monkey Boson 2010-10-07 20:31:33

Answer 5

A:

You should post the explain plans of the two queries so we can see what they are.

My guess is that the fast one is using a "Covering index", and the slow one isn't.

This means that the slow one must do 67,000 primary key lookups, which will be very inefficient if the table isn't all in memory (typically requiring 67k IO operations if the table is arbitrarily large and each row in its own page).

In MySQL, EXPLAIN will show "Using index" if a covering index is being used.

MarkR 2010-10-03 13:35:39

The explain is identical in both cases. Even though the items in the where clause are indexed, MySQL is deciding in both cases to perform a full table scan (probably because 67000 represents a significant fraction of the entire table's size). In the last two queries I mentioned, neither can use the "covering index" technique because they both contain columns that are not indexed.

Monkey Boson 2010-10-07 20:26:29

ansaurus

tags:

views:

answers:

Extra column ruins MySQL performance

related questions