views:

1087

answers:

8

I have 2 tables. 1 is music and 2 is listenTrack. listenTrack tracks the unique plays of each song. I am trying to get results for popular songs of the month. I'm getting my results but they are just taking too long. Below is my tables and query

430,000 rows

CREATE TABLE `listentrack` (
    `id` int(11) NOT NULL AUTO_INCREMENT,
    `sessionId` varchar(50) NOT NULL,
    `url` varchar(50) NOT NULL,
    `date_created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    `ip` varchar(150) NOT NULL,
    `user_id` int(11) DEFAULT NULL,
     PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=731306 DEFAULT CHARSET=utf8

12500 rows

CREATE TABLE `music` (
   `music_id` int(11) NOT NULL AUTO_INCREMENT,
   `user_id` int(11) NOT NULL,
   `title` varchar(50) DEFAULT NULL,
   `artist` varchar(50) DEFAULT NULL,
   `description` varchar(255) DEFAULT NULL,
   `genre` int(4) DEFAULT NULL,
   `file` varchar(255) NOT NULL,
   `url` varchar(50) NOT NULL,
   `allow_download` int(2) NOT NULL DEFAULT '1',
   `plays` bigint(20) NOT NULL,
   `downloads` bigint(20) NOT NULL,
   `faved` bigint(20) NOT NULL,
   `dateadded` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
   PRIMARY KEY (`music_id`)
) ENGINE=MyISAM AUTO_INCREMENT=15146 DEFAULT CHARSET=utf8


SELECT COUNT(listenTrack.url) AS total, listenTrack.url 
FROM listenTrack
LEFT JOIN music ON music.url = listenTrack.url
WHERE DATEDIFF(DATE(date_created),'2009-08-15') = 0
GROUP BY listenTrack.url
ORDER BY total DESC
LIMIT 0,10

this query isn't very complex and the rows aren't too large, i don't think.

Is there any way to speed this up? Or can you suggest a better solution? This is going to be a cron job at the beggining of every month but I would also like to do by the day results as well.

Oh btw i am running this locally, over 4 min to run, but on prod it takes about 45 secs

+2  A: 

Try creating an index that will help with the join:

CREATE INDEX idx_url ON music (url);
VoteyDisciple
A: 

First thought, MyISAM is slow! Beside any performance improvements you could do with this query, try moving to InnoDB engine first.

Havenard
That is not correct. MyISAM is actually generally a faster engine than InnoDB. InnoDB handles high loads more gracefully, and offers better features, but it is not faster.
zombat
Its faster for simple stuff, as using .TXT files would. InnoDB is much faster when work gets heavy, and you just agreed with that.
Havenard
It is faster at managing multiple queries at once because of row-level locking, but that's not going to have any effect on speeding up a query that takes 45 seconds. Moving to InnoDB to optimize a single query is not a good idea.
zombat
You'll find the following advice in Zawodny's book, High Performance MySQL: InnoDB performs better when the write/read ratio is between 10% and 90%, thanks to more advanced locking. When your database is extremely write heavy, or extremely read heavy, locking becomes less of an issue, and MyISAM will be faster, as it is much lighter weight than innodb. Long story short: with the data presented here, you cannot conclusively say that innodb is the best choice for this application.
Frank Farmer
I always had bad performance experiences with MyISAM. Actually, before knowing it was a engine matter, I almost gave up using MySQL, I got crazy when I saw MSSQL and PGSQL getting substantial better performance with the same apps I originaly wrote to MySQL.
Havenard
+4  A: 

I'm more of a SQL Server guy but these concepts should apply.

I'd add indexes:

  1. On ListenTrack, add an index with url, and date_created
  2. On Music, add an index with url

These indexes should speed the query up tremendously (I originally had the table names mixed up - fixed in the latest edit).

Jeff Siver
I believe the index table names are flipped. I manged to do that the first time too and caught it right before posting.
TheJacobTaylor
TheJacobTaylor was right, I had the table names backwards. I have fixed this.
Jeff Siver
The question I have, is why did we both get them backwards? I looked through the post again, but did not see the trigger that I followed wrong.
TheJacobTaylor
My guess is: In the question introduction, they are introduced as music then listentrack, but the creation queries show listentrack first.
Rob Drimmie
+2  A: 

For the most part you should also index any column that is used in a JOIN. In your case, you should index both listentrack.url and music.url

@jeff s - An index music.date_created wouldnt help because you are running that through a function first so MySQL cannot use an index on that column. Often, you can rewrite a query so that the indexed referenced column is used statically like:

DATEDIFF(DATE(date_created),'2009-08-15') = 0

becomes

date_created >= '2009-08-15' and date_created < '2009-08-15'

This will filter down records that are from 2009-08-15 and allow any indexes on that column to be candidates. Note that MySQL might NOT use that index, it depends on other factors.

Your best bet is to make a dual index on listentrack(url, date_created) and then another index on music.url

These 2 indexes will cover this particular query.

Note that if you run EXPLAIN on this query you are still going to get a "using filesort" because it has to write the records to a temporary table on disk to do the ORDER BY.

In general you should always run your query under EXPLAIN to get an idea on how MySQL will execute the query and then go from there. See the EXPLAIN documentation:

http://dev.mysql.com/doc/refman/5.0/en/using-explain.html

Cody Caughlan
Brutal timing. :) Excellent answer. Beat me by seconds.
TheJacobTaylor
Thanks a lot sounds very useful. i will follow the your link and read up and try your example
kayem
+2  A: 

I think I might have missed the obvious before. Why are you joining the music table at all? You do not appear to be using the data in that table at all and you are performing a left join which is not required, right? I think this table being in the query will make it much slower and will not add any value. Take all references to music out, unless the url inclusion is required, in which case you need a right join to force it to not include a row without a matching value.


I would add new indexes, as the others mention. Specifically I would add: music url listentrack date_created,url

This will improve your join a ton.

Then I would look at the query, you are forcing the system to perform work on each row of the table. It would be better to rephrase the date restriction as a range.

Not sure of the syntax off the top of my head: where '2009-08-15 00:00:00' <= date_created < 2009-08-16 00:00:00

That should allow it to rapidly use the index to locate the appropriate records. The combined two key index on music should allow it to find the records based on the date and URL. You should experiment, they might be better off going in the other direction url,date_created on the index.

The explain plan for this query should say "using index" on the right hand column for both. That means that it will not have to hit the data in the table to calculate your sums.

I would also check the memory settings that you have configured for MySQL. It sounds like you do not have enough memory allocated. Be very careful on the differences between server based settings and thread based settings. The server with a 10MB cache is pretty small, a thread with a 10MB cache can use a lot of memory quickly.

Jacob

TheJacobTaylor
I am using data in that table but i was debuggin where the slowness come from. No join 1 sec query. Join after adding idx index on music.url ran at 7 secs and adding music.plays pushed it up to 10
kayem
Can you post the explain plan for the "real" query or the one with the index? How many records are we talking about for a day (order of magnitude)? How much memory are you using on the box? Index and data cashes, or just cache on InnoDB, are critical and easy to fix.
TheJacobTaylor
A: 

After you add indexes then you may want to explore adding a new column for the date_created to be a unix_timestamp, which will make math operations quicker.

I am not certain why you have the diff function though, as it appears you are looking for all rows that were updated on a particular date.

You may want to look at your query as it seems to have an error.

If you use unit tests then you can compare the results of your query and a query using a unix timestamp instead.

James Black
A: 

Why are you repeating the url in both tables?

Have listentrack hold a music_id instead, and join on that. Gets rid of the text search as well as the extra index.

Besides, it's arguably more correct. You're tracking the times that a particular track was listened to, not the url. What if the url changes?

kyoryu
A: 

you might want to add an index to the url field of both tables.

having said that, when i converted from mysql to sql server 2008, with the same queries and same database structures, the queries ran 1-3 orders of magnitude faster.

i think some of it had to do with the rdbms (mysql optimizers are not so good...) and some of it might have had to do with how the rdbms reserve system resources. although, the comparisons were made on production systems where only the db would run.

mson