views:

336

answers:

3

I am new to mysql and I have been pulling my hair out about this problem for days. I need to improve/optimize this query so that it runs faster - right now its taking over 5 seconds.

Here is the query:

SELECT SQL_NO_CACHE COUNT(*) as multiple, a.*,b.*  
FROM announcements as a  
INNER JOIN stores as s  
ON a.username=s.username
    WHERE s.username is not null AND s.state='NC' 
GROUP BY a.announcement_id
ORDER BY a.dt DESC LIMIT 0,10

Stores table consists of: store_id, username, name, state, city, zip, etc...

Announcements table consists of: announcement_id, msg, dt, username

The stores table has around 10,000 records and the announcements table has around 500,000 records.

What I'm trying to accomplish in english - display the 10 most recent store announcements BUT what makes this complicated is that stores can have multiple entries in the stores table with the same userid (one row per location). So if a chain store, lets say "Chipotle" sends an announcement, I want to display only one row for their announcement with a note next to it that says "this store has multiple locations". That's why I'm using the count(*) and group by, so if count(*) > 1 I know there are multiple locations for the announcement.

The where condition can be any state, city, or zip. Using SQL_NO_CACHE because announcements are updated frequently so you rarely get the same results, does that make sense?

I would really appreciate any suggestions of how to do this better. I know little about indexes, but I did create an index for the "username" field in both tables. Feel free to shred me apart here, I know I must be missing something.

Update --

DESC stores;

Field       Type            Null    Key     Default         Extra  
store_id    int(11)         NO      PRI     NULL            auto_increment  
username    varchar(20)     NO      MUL     NULL       
name        varchar(100)    NO              NULL       
street      varchar(100)    NO              NULL       
city        varchar(50)     NO              NULL       
state       varchar(2)      NO              NULL       
zip         varchar(15)     NO              NULL      

DESC announcements;

Field              Type           Null      Key     Default     Extra
dt                 datetime       NO                NULL     
username           varchar(20)    NO        MUL     NULL     
msg                varchar(200)   NO                NULL     
announcement_id    int(11)        NO        PRI     NULL        auto_increment

EXPLAIN output;

id  select_type     table   type    possible_keys   key       key_len     ref         rows     Extra
1   SIMPLE          a       index   username        PRIMARY   47          NULL        315001   Using temporary; Using filesort
1   SIMPLE          b       ref     username        username  62          a.username  1        Using where
+2  A: 

Try something like this:

SELECT SQL_NO_CACHE COUNT(*) as multiple, a.*,b.*   
FROM announcements as a   
INNER JOIN 
(
  SELECT username, COUNT(username) as multiple FROM stores
  WHERE username IS NOT NULL AND state = 'NC'
  GROUP BY username
 )  as s 
ON a.username=s.username 
ORDER BY a.dt DESC LIMIT 10 
John M
I think you meant to SELECT `multiple` and not COUNT(*) in the outer query, no?
Saggi Malachi
Also select `s.*`, not `b.*` because there's no table alias `b` in the query.
Bill Karwin
Thanks this was definitely faster.
Art Peterson
A: 

If you are ordering on the dt column, but there is no index on that column, the MySQL will have to do a (slow, expensive) sort of all of your result rows on that column every time you run the query

Try adding an index on announcements.dt -- MySQL may be able to access the rows in order by using the index, and avoid the sorting step afterwards.

Ian Clelland
A: 
  • Change the order of tables in your JOIN, MySQL reads rows from the first table and then finds matching rows in the second table. If you always filter your result by fields in the stores table then the stores table should be the leading table in your JOIN so it won't pick and sort unnecessary rows from the announcements table.
    In the EXPLAIN output you pasted it seems like only one shop matched the query, switching the order of tables would cause it to only look for that specific shop in the announcements table.
  • Add an index on the dt column (having an indexed integer column with unixtime would be even better)
  • If possible - create an integer userID for each username and JOIN using that column (add an on index on that one as well)
  • Not sure if MySQL still has problems with this but replacing COUNT(*) with COUNT(1) might be helpful.
Saggi Malachi