views:

158

answers:

4

I have a fairly simple process running that periodically pulls RSS feeds and updates articles in a MySQL database.

The articles table is filled to about 130k rows right now. For each article found, the processor checks to see if the article already exists. These queries almost always take 300 milliseconds, and about every 10 or 20 tries, they take more than 2 seconds.

SELECT id FROM `articles` WHERE (guid = 'http://example.com/feed.rss')  LIMIT 1;
# Query_time: 2.754567  Lock_time: 0.000000  Rows_sent: 0  Rows_examined: 0

I have an index on the guid column but whenever a new article is encountered, it's added to the articles table - invalidating the query cache (right?).

Some of the other fields in the slow query log report 120+ rows examined.

Of course on my development machine, these queries take about 0.2 milliseconds.

The server is a virtual host from Engine Yard Solo (EC2) with 1.7GB of memory and whatever CPU EC2 ships with these days.

Any advice would be greatly appreciated.

Update

As it turns out the problem was between the chair and the keyboard.

I had an index on 'id', but was querying on 'guid'.

Adding an index on 'guid' brought the query time down to 0.2 ms each.

Thanks for all the helpful tips everyone!

+4  A: 

Run:

EXPLAIN SELECT id FROM `articles` WHERE (guid = 'http://example.com/feed.rss')  LIMIT 1;

Notice the EXPLAIN in front. That'll tell you what MySQL is doing. Its hard to believe probing one row from an index could ever take 2.7s, unless your machine is seriously overloaded and/or thrashing. Considering the row counts of 0, I'm guessing MySQL did a full table scan to find nothing, which probably means you don't have the index you think you do.

To answer your other question, whenever you make any change to the articles table, all the query cache entries involving that table are invalidated.

derobert
+1  A: 

The log says that no rows were read or even examined, so the problem is not with your query but most likely with your server. EC2's Achilles' heel is its IO/s, perhaps MySQL had to load the index from disk but the server's disks were completely saturated.

If your index is small enough to fit in memory (make sure your my.cnf allocates enough memory to key_buffer (MyISAM) or innodb_buffer_pool_size (InnoDB)), you should be able to preload it using

SELECT guid FROM articles

Check out the EXPLAIN to make sure it says "Using Index." If it doesn't, this one should:

SELECT guid FROM articles FORCE INDEX (guid) WHERE LENGTH(guid) > 0

Alternatively, if guid isn't your PRIMARY KEY or UNIQUE, you may remove its index and create another indexed column used to retrieve records quickly at a fraction of the index size. The column guid_crc32 would be an INT UNSIGNED and would hold the CRC32 of guid

ALTER TABLE articles ADD COLUMN guid_crc32 INT UNSIGNED, ADD INDEX guid_crc32 (guid_crc32);
UPDATE articles SET guid_crc32 = CRC32(guid);

Your SELECT query would then look like this:

SELECT id FROM articles WHERE guid = 'http://example.com/feed.rss' AND guid_crc32 = CRC32('http://example.com/feed.rss') LIMIT 1;

The optimizer should use the index on guid_crc32, which should be both faster and smaller than searching through guid.

Josh Davis
I've just noticed the part where Luke says the same query executed over and over would eventually be slow every few dozen tries, which weakens my theory about loading the index from disk. Unless the server has so little memory reserved for index that it does swap it out every few dozen tries.
Josh Davis
A: 

if this table gets updated alot then mysql may not update the index-counts properly. try "CHECK TABLE articles" to update the index counts and see if your table is fine.

also try to see if doing EXPLAIN on your query give the same results on your dev and prod machines. if the results are different try OPTIMIZE TABLE.

Are these myisam or innodb tables?

Nir Levy
A: 

Assuming GUID is indexed and ID is your primary key, something is "wrong." In that scenario, it is an index only query. The index is being bumped from memory and the disks are busy, perhaps.

Depending on your update / insert / delete pattern, you database may be crying for an "optimize" command.

SQL Commands I'd like to see the output of:

show table status like 'articles';
explain SELECT id FROM `articles` WHERE (guid = 'http://example.com/feed.rss')  LIMIT 1;
explain articles;

System commands I'd like to see the output of (assuming Linux):

iostat 5 5

Tell us how much memory you have because 1.7mb is wrong, or something really exciting is happening.

Edit how much memory is available to your SQL server in my.cnf?

Autocracy