I have a fairly simple process running that periodically pulls RSS feeds and updates articles in a MySQL database.
The articles table is filled to about 130k rows right now. For each article found, the processor checks to see if the article already exists. These queries almost always take 300 milliseconds, and about every 10 or 20 tries, they take more than 2 seconds.
SELECT id FROM `articles` WHERE (guid = 'http://example.com/feed.rss') LIMIT 1;
# Query_time: 2.754567 Lock_time: 0.000000 Rows_sent: 0 Rows_examined: 0
I have an index on the guid column but whenever a new article is encountered, it's added to the articles table - invalidating the query cache (right?).
Some of the other fields in the slow query log report 120+ rows examined.
Of course on my development machine, these queries take about 0.2 milliseconds.
The server is a virtual host from Engine Yard Solo (EC2) with 1.7GB of memory and whatever CPU EC2 ships with these days.
Any advice would be greatly appreciated.
Update
As it turns out the problem was between the chair and the keyboard.
I had an index on 'id', but was querying on 'guid'.
Adding an index on 'guid' brought the query time down to 0.2 ms each.
Thanks for all the helpful tips everyone!