views:

68

answers:

3

I'm building something akin to Google Analytics and currently I'm doing real time database updates. Here's the workflow for my app:

  1. User makes a RESTful API request
  2. I find a record in a database, return it as JSON
  3. I record the request counter for the user in the database (i.e. if I user makes 2 API calls, I increment the request counter for the user by '2').

1 and 2 are really fast in SQL - they are SELECTs. #3 is really slow, because it's an UPDATE. In the real world, my database (MySQL) is NOT scaling. According to New Relic, #3 is taking most of the time - up to 70%!

My thinking is that I need to stop doing synchronous DB operations. In the short term, I'm trying to reduce DB writes, so I'm thinking about a global hash (say declared in environment.rb) that is accessible from my controllers and models that I can write to in lieu of writing to the DB. Every so often I can have a task write the updates that need to be written to the DB.

Questions:

  1. Does this sound reasonable? Any gotchas?
  2. Will I run into any concurrency problems?
  3. How does this compare with writing logs to the file system and importing later?
  4. Should I be using some message queuing system instead, like Starling? Any recommendations?

PS: Here's the offending query -- all columns of interest are indexed:

UPDATE statistics_api SET count_request = COALESCE(count_request, ?) + ? WHERE (id = ?)

A: 

Do it later with DelayedJob.

Edit: If your DB is being hit so much that one UPDATE is noticeably slowing down your requests, maybe you should consider setting up a master-slave database architecture.

Matt Grande
+1  A: 

Hi Bilal,

Your hash solution sounds like it's a bit too complex. This set of slides is an insightful and up-to-date resource that addresses your issue head on:

They say the simplest thing would be:

Thread.new do
  MyModel.do_long_thing
end

But the Ruby mysql driver is blocking, so a mysql request in that thread could still block your request. You could use mysqlplus as a driver and get non-blocking requests, but now we're getting a pretty complex and specialized solution.

If you really just want this out of your request cycle, but can spare locking the server for it, you can do something like:

MyController
  after_filter :do_jobs

  def index
    @job = Proc.new{ MyModel.do_long_thing }
  end

private

  def do_jobs
    return unleses @job
    @job.call
  end

end

I'd abstract it into ApplicationController more, but you get the idea. The proc defers updates until after the request.

If you are serious about asynchronous and background processes, you'll need to look at the various options out there and make a decision about what fits you needs. Matt Grande recommended DelayedJob- that's a very popular pick right now, but if your entire server is bogged down with database writes, I would not suggest it. If this is just a particularly slow update, but your server is not over-loaded, then maybe it's a good solution.

I currently use Workling with Starling in my most complex project. Workling has been pretty extensible, but Starling has been a little less than ideal. One of Workling's advantages is the ability to swap backends, so we can move off Starling if it becomes a large problem.

If your server is bogged with writes, you will need to look at scaling it up regardless of your asynchronous task approach.

Good luck! It sounds like you're app is growing at an exciting pace :-)

mixonic
A: 

Hi, I just asked a similar question over on the EventMachine mailing list, and I was suggested that I try phat (http://www.mikeperham.com/2010/04/03/introducing-phat-an-asynchronous-rails-app/) to get asynchronous database access.

Maybe you should try it out.

rmk