views:

242

answers:

2

I've developed a web based point of sale system for one of my clients in Ruby on Rails with MySQL backend. These guys are growing so fast that they are ringing close to 10,000 transactions per day corporate-wide. For this question, I will use the transactions table as an example. Currently, I store the transactions.status as a string (ie: 'pending', 'completed', 'incomplete') within a varchar(255) field that has an index. In the beginning, it was fine when I was trying to lookup records by different statuses as I didn't have to worry about so many records. Over time, using the query analyzer, I have noticed that performance has worsened and that varchar fields can really slowdown your query speed over thousands of lookups. I've been thinking about converting these varchar fields to integer based status fields utilizing STATUS CONSTANT within the Transaction model like so:

class Transaction < ActiveRecord::Base
  STATUS = { :incomplete => 0, :pending => 1, :completed => 2 }

  def expensive_query_by_status(status)
    self.find(:all,
              :select => "id, cashier, total, status",
              :condition => { :status => STATUS[status.to_sym] })
end

Is this the best route for me to take? What do you guys suggest? I am already using proper indexes on various lookup fields and memcached for query caching wherever possible. They're currently setup on a distributed server environment of 3 servers where 1st is for application, 2nd for DB & 3rd for caching (all in 1 datacenter & on same VLAN).

A: 

You might want to start limiting your searchings (if your not doing that already), #find(:all) is pretty taxing on that scale. Also you might want to think about what your Transaction model is reaching out for as it gets translated into your views and perhaps eager load those to minimize requests to the db for extra information.

nowk
A: 

Have you tried the alternative on a representative database? From the example given, I'm a little sceptical that it's going to make much difference, you see. If there are only three statuses then a query by status may be better-off not using an index at all.

Say "completed" comprises 80% of your table - with no other indexed column involved, you're going to be requiring more reads if the index is used than not. So a query of that type is almost certainly going to get slower as the table grows. "incomplete" and "pending" queries would probably still benefit from an index, however; they'd only be affected as the total number of rows with those statuses grew.

How often do you look at everything, complete and otherwise, without some more selective criterion? Could you partition the table in some (internal or external) way? For example, store completed transactions in a separate table, moving new ones there as they reach their final (?) state. I think internal database partitioning was introduced in MySQL 5.1 - looking at the documentation it seems that a RANGE partition might be appropriate.

All that said, I do think there's probably some benefit to moving away from storing statuses as strings. Storage and bandwidth considerations aside, it's a lot less likely that you'll inadvertently mis-spell an integer or, better yet, a constant or symbol.

Mike Woodhouse