views:

71

answers:

1

I'm having trouble diagnosing a problem I'm having on my ubuntu scalr/ec2 production environment.

The trouble is apparently randomly, database queries and/or memcache queries will take MUCH longer than they should. I've seen a simple select statement take 130ms or a Memcache fetch take 65ms! It can happen a handful of times per request, causing some requests to take twice as long as they should.

To diagnose the problem, I wrote a very simple script which will just connect to the MySql server and run a query.

require 'mysql'

mysql = Mysql.init
mysql.real_connect('', '', '', '')

max = 0
100.times do
  start = Time.now
  mysql.query('select * from navigables limit 1')
  stop = Time.now

  total = stop - start
  max = total  if total > max
end

puts "Max Time: #{max * 1000}"

mysql.close

This script consistently returned a really high max time, so I eliminated any Rails as the source of the problem. I also wrote the same thing in Python to eliminate Ruby. And indeed the Python one took inordinate amounts of time as well!

Both MySql and Memcache are on their own boxes, so I considered network latency, but watching pings and tracerouteing look normal.

Also running the queries/fetches on the respective machines returns expected times, and I'm running the same version gems on my staging machine without this issue.

I'm really stumped on this one... any thoughts on something I could try to diagnose this? Thanks

A: 

My only thought is that it might be disk?

Toby Hede