views:

195

answers:

3

I'm writing a Rails app in which I have two models: a Machine model and a MachineUpdate model. The Machine model has many MachineUpdates. The MachineUpdate has a date/time field. I'm trying to retrieve all Machine records that have the following criteria:

  1. The Machine model has not had a MachineUpdate within the last 2 weeks, OR
  2. The Machine model has never had any updates.

Currently, I'm accomplishing #1 with a named scope:

named_scope :needs_updates,
            :include => :machine_updates,
            :conditions => ['machine_updates.date < ?', UPDATE_THRESHOLD.days.ago]

However, this only gets Machine models that have had at least one update. I also wanted to retrieve Machine models that have had no updates. How can I modify needs_updates so the items it returns fulfills that criteria as well?

+3  A: 

One solution is to introduce a counter_cache:

# add a machine_updates_count integer database column (with default 0)
# and add this to your Machine model:
counter_cache :machine_updates_count

and then add OR machine_updates_count = 0 to your SQL conditions.

However, you can also solve the problem without a counter cache by using a LEFT JOIN:

named_scope :needs_updates,
  :select => "machines.*, MAX(machine_updates.date) as last_update",
  :joins  => "LEFT JOIN machine_updates ON machine_updates.machine_id = machines.id",
  :group  => "machines.id",
  :having => ["last_update IS NULL OR last_update < ?", lambda{ UPDATE_THRESHOLD.seconds.ago }]

The left join is necessary so that you are sure you are looking at the most recent MachineUpdate (the one with MAX date).

Note also that you have to put your condition in a lambda so it is evaluated every time the query is run. Otherwise it will be evaluated only once (when your model is loaded on application boot-up), and you will not be able to find Machines that have come to need updates since your app started.

UPDATE:

This solution works in MySQL and SQLite, but not PostgreSQL. Postgres does not allow naming of columns in the SELECT clause that are not used in the GROUP BY clause (see discussion). I'm very unfamiliar with PostgreSQL, but I did get this to work as expected:

named_scope :needs_updates, lambda{
  cols = Machine.column_names.collect{ |c| "\"machines\".\"#{c}\"" }.join(",")
  {
    :select => cols,
    :group  => cols,
    :joins  => 'LEFT JOIN "machine_updates" ON "machine_updates"."machine_id" = "machines"."id"',
    :having => ['MAX("machine_updates"."date") IS NULL OR MAX("machine_updates"."date") < ?', UPDATE_THRESHOLD.days.ago]
  }
}
Alex Reisner
When I initially implemented this change, I got the following database error: `column "last_update" does not exist`. I took out the `last_update` stuff and replaced it with the `MAX(...)` line. I then got the following database error: `invalid input syntax for type timestamp: "--- !ruby/object:Proc {}`. Out of curiosity, I removed the lambda, which resulted in another database error: `column "machines.machine_name" must appear in the GROUP BY clause or be used in an aggregate function`.
mipadi
Also, as a side note, the lambda doesn't seem to be necessary; based on the logs, the timestamp sent through the SQL statement is updated appropriately whenever the page is refreshed.
mipadi
I'm pretty sure you do need the lambda. Without it, things will work in the development environment, but not in production. As for the errors: (1) what version of Rails are you using? (2) what kind of database are you using? (3) did you add any options to the named scope? I created a small mock application and that code worked on Rails 2.3.5 with SQLite3.
Alex Reisner
I'm using Rails 2.3.5 with PostgreSQL 8.1. No options were added, I just copied what you had. And the lambda results in a SQL syntax error (whereas not using it doesn't seem to cause problems).
mipadi
You need to use the lambda in regards to the date. If you do not, the first time you launch your app, `UPDATE_THRESHOLD.days.ago` will be evaluated and will be the same date until the next time you restart your app. The lambda will take care of reevaluating that condition as the requests come in.
theIV
The :group, :joins, and :having options are not necessary, see my answer below.
Daniel Beardsley
A: 

If you can make changes in the table, then you can use the :touch method of the belongs_to association.

For instance, add a datetime column to Machine named last_machine_update. Thereafter in the belongs_to of MachineUpdate, add :touch => :last_machine_update. This will cause that field to become updated with the last time you either added or modified a MachineUpdate connected to that Machine, thus removing the need for the named scope.

Otherwise I would probably do it like Alex proposes.

Jimmy Stenke
A: 

I just ran into a similar problem. It's actually pretty simple:

Machine.all(
  :include => :machine_updates,
  :conditions => "machine_updates.machine_id IS NULL OR machine_update.date < ?", UPDATE_THRESHOLD.days.ago])

If you were doing a named scope, just use lambdas to ensure that the date is re-calculated every time the named scope is called

named_scope :needs_updates, lambda { {
  :include => :machine_updates,
  :conditions => "machine_updates.machine_id IS NULL OR machine_update.date < ?", UPDATE_THRESHOLD.days.ago]
} }

If you want to avoid returning all of the MachineUpdate records in your query, then you need to use the :select option to only return the columns you want

named_scope :needs_updates, lambda { {
  :select => "machines.*",
  :conditions => "machine_updates.machine_id IS NULL OR machine_update.date < ?", UPDATE_THRESHOLD.days.ago]
} }
Daniel Beardsley