views:

198

answers:

6

Dears,

I need your help to optimize the query below. Let us assume we have a web application for articles. The software use two table;one is the article table and the second one is the users table. The article table hold the date when the article is created,the id,the body,the title & the section. Let us assume that we have one section called "news" and there are one million article belong to news section. So in this case, how to optimize the following query:

SELECT username,title FROM article,users 
WHERE article.auther_id=users.id AND section LIKE 'news' 
ORDER BY article.date DESC 
LIMIT 0,40

The table structures are:

CREATE TABLE `article` (
`id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY ,
`title` VARCHAR( 100 ) NOT NULL ,
`body` VARCHAR( 200 ) NOT NULL ,
`date` VARCHAR( 30 ) NOT NULL ,
`auther_id` INT NOT NULL ,
`section` VARCHAR( 30 ) NOT NULL
) ENGINE = MYISAM ;


CREATE TABLE `users` (
`id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY ,
`username` VARCHAR( 30 ) NOT NULL
) ENGINE = MYISAM ;

I tried to create one index that consists of the section & the date but it is not the best,because if we have 2 millions record and one million of them belong to one section,the DB will scan one million row.

A: 

Add an index on article.auther_id. in addition to the index you've already added on date and section.

Scott Saunders
A: 

From looking at the query it seems you would need an index on section and an index on date (two separate indexes). Try adding these and then look at the explain plan to see if they work or if there are some other tweaks that might help.

TskTsk
I could be curious to see if there is a difference in "section LIKE 'news'" and "section = 'news'" as well. Probably not, but I would check that. I usually avoid using LIKE unless I need wildcards.
Arthur Thomas
@Arthur: I think that would probably make a big difference. Good point.
Scott Saunders
Dropping the LIKE will make this considerably faster.
middaparka
In `MySQL`, there is no difference between a `LIKE` and an `=` against a string constant without wildcard characters: http://dev.mysql.com/doc/refman/5.0/en/range-access-single-part.html
Quassnoi
A: 

OK, first off use the DATE datatype for dates as this will be a lot faster than using a VARCHAR.

Secondly, you simply need to add the following keys to your create table syntax for the article table.

KEY auther_id (auther_id),
KEY section (section),

Somewhat irrelevant, but if you were using InnoDB you could additionally make the auther_id as a foreign key.

Also, if you don't need to use 'LIKE' don't - simply checking for section="news" will be a lot faster. (You could even use an ENUM for the sections if there's a finite list.)

middaparka
I replaced 'LIKE' by '='. Changing the date to be Mysql data need some change in the software. the current data holds the time stamp value generated by php function time().I am planning to change in the future.Thanks a lot.
usef_ksa
A: 

As others have said, indexes on date, section and auther_id. According to this:

http://www.mysqlperformanceblog.com/2008/08/22/multiple-column-index-vs-multiple-indexes/

it may well be beneficial to have one multiple-column index rather than three separate ones, like this:

...
key idx_combo (auther_id, section, date)
...

in your table definition.

davek
A: 

Best thing to do is to create a partitioned table basing on date/sections.

then create locally partitioned indexes suchthat the performance will be much higher.

Venkataramesh Kommoju
A: 

You need to create an index on (section, date).

Don't include auther_id as a leading column: articles will be leading in the join and no searching will be performed on this column.

Since there is a LIMIT 0, 40 in your query, MySQL will not have to scan the whole index. It will just pick the first 40 records.

Here's a test script to check:

CREATE TABLE `article` (
`id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY ,
`title` VARCHAR( 100 ) NOT NULL ,
`body` VARCHAR( 200 ) NOT NULL ,
`date` VARCHAR( 30 ) NOT NULL ,
`auther_id` INT NOT NULL ,
`section` VARCHAR( 30 ) NOT NULL
) ENGINE = MYISAM ;


CREATE TABLE `users` (
`id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY ,
`username` VARCHAR( 30 ) NOT NULL
) ENGINE = MYISAM ;

INSERT
INTO    article
SELECT  id,
        CONCAT('Title ', id),
        CONCAT('Body ', id),
        DATE_FORMAT('2009-12-18' - INTERVAL id MINUTE, '%Y-%m-%d %H:%i:%S'),
        (id - 1) % 500 + 1,
        'news'
FROM    t_source;

INSERT
INTO    users
SELECT  id, CONCAT('Username ', id)
FROM    t_source
LIMIT 500;

CREATE INDEX ix_article_section_date ON article (section, date);

SELECT  username,title
FROM    article
JOIN    users
ON      users.id = article.auther_id
WHERE   section = 'news'
ORDER BY
        article.date DESC
LIMIT 0, 40;

t_source is a dummy table with 1,000,000 rows in it.

The final query completes in 0.0018 s on my machine (instantly)

Here's the execution plan:

1, 'SIMPLE', 'article', 'range', 'ix_article_section_date', 'ix_article_section_date', '92', '', 999998, 'Using where'
1, 'SIMPLE', 'users', 'eq_ref', 'PRIMARY', 'PRIMARY', '4', 'test.article.auther_id', 1, ''
Quassnoi
Thanks a lot. very informative answer,specially this part: "Since there is a LIMIT 0, 40 in your query, MySQL will not have to scan the whole index. It will just pick the first 40 records."thanks a lot all of you members.I appreciate your answers.
usef_ksa