views:

440

answers:

3

This question is related to this one.

I have a page table with the following structure:

CREATE TABLE  mydatabase.page (
  pageid int(10) unsigned NOT NULL auto_increment,
  sourceid int(10) unsigned default NULL,
  number int(10) unsigned default NULL,
  data mediumtext,
  processed int(10) unsigned default NULL,
  PRIMARY KEY  (pageid),
  KEY sourceid (sourceid)
) ENGINE=MyISAM AUTO_INCREMENT=9768 DEFAULT CHARSET=latin1;

The data column contains text whose size is around 80KB - 200KB per record. The total size of the data stored in the data column is around 1.5GB.

Executing this query takes 0.08 seconds:

select pageid from page

But executing this query takes around 130.0 seconds:

select sourceid from page

As you see, I've got a primary index on page.pageid and an index on page.sourceid. So should the second query be taking THAT long?

Edit #1

EXPLAIN returned

id select_type table type  possible_keys key      key_len ref rows Extra
1  SIMPLE      page  index               sourceid 5           9767 Using index

I'm sorry but profiling didn't work... MySQL (its 4.1.22) did not recognize SHOW PROFILE query.

SHOW INDEX returned

Table Non_unique Key_name  Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment
page  0          PRIMARY   1            pageid      A         9767                             BTREE 
page  1          sourceid  1            sourceid    A         3255                        YES  BTREE
A: 

How different are your sourceid fields? If you have only a few different sourceid values compared to the number of rows then you can try increasing the size of the index.

Sander Marechal
sourceid contains around 3500 distinct values. It refers to a source table that has around 3500 rows. Each source contains 0 to 700 pages.
Salman A
A: 

Did you try to enforce the use of the index? Like:

SELECT sourceid FROM page USE INDEX (sourceid_index)

Like sgehrig comments, check using EXPLAIN if the index is used? And share the result?

EXPLAIN select sourceid from page

It could also help to share the definiton of the indexes:

SHOW INDEX FROM page
Andomar
I've revised question and added the information you requested
Salman A
ahhh... select sql_no_cache sourceid from page use index (sourceid) worked and the query took 0.09 seconds. For some reason mysql is unable to figure out the index to use on its own. i now need a query that forces an index in cascaded join (pagedetail > page > source)
Salman A
Can you really confirm that only the combination of SQL_NO_CACHE and USE INDEX brings the expected speed benefit? Could it be that SQL_NO_CACHE is the determinant factor?
Stefan Gehrig
I use SQL_NO_CACHE for testing the *actual* performance of the query. If i dont use this keyword (which is what i normally do) the query runs slow for the first time, later the query results come from the cache which is always faster but not the solution to the problem. SQL_NO_CACHE forces mysql not to use/save-results-in cache returning the actual time the query takes in normal circumstances.
Salman A
Query works fine with or without SQL_NO_CACHE otherwise.
Salman A
That's exactly why I asked - I wanted to rule out the option that the query cache had some influence on the execution time. So obviously MySQL needs the index hint to make use of the covering index.
Stefan Gehrig
A: 

As MySQL 4.1.22 is fairly old (02 November 2006) I'd suspect that it doesn't support the notion of covering indexes for secondary keys. EXPLAIN shows that the query actually uses the index, so I'd assume that the additional time is needed to read all the result rows (instead of just returning the index content when using covering indexes) to extract the sourceid column.

Do you have the possibility to check the query on a more recent MySQL server version?

Stefan Gehrig