views:

90

answers:

1

I have a collection of geography related classes: State, Msa, County, City, etc. The classes all descend from a base Location class. The classes are related, for the most part, through a denormalized join table called geography. So I have...

class Geography < ActiveRecord::Base
  belongs_to :state
  belongs_to :county
  belongs_to :city
  # ... etc.
end

class Location < ActiveRecord::Base
end

class State < Location
  has_many :geographies
  has_many :msas, :through => :geographies, :uniq => true
  # ... etc.
end

class Msa < Location
  has_many :geographies
  has_many :states, :through => :geographies, :uniq => true
  # ... etc.
end

Now, when I run the following from the console:

>> msas = Msa.find(:all, :include=>"states", :conditions=>{"states_locations"=>{"id"=>"1"}})

I get back the correct number of results (13 in this case). However, running the SQL that this find call produces I get back 1,000s of results (again the geography table is a datamart of sorts which is why I'm using the :uniq option on the association).

SELECT          `locations`.`id` AS t0_r0, 
                `locations`.`parent_id` AS t0_r1, 
                `locations`.`type` AS t0_r2, 
                `locations`.`name` AS t0_r3, 
                `states_locations`.`id` AS t1_r0, 
                `states_locations`.`parent_id` AS t1_r1, 
                `states_locations`.`type` AS t1_r2, 
                `states_locations`.`name` AS t1_r3
FROM            `locations` 
LEFT OUTER JOIN `geography` 
ON              `locations`.`id` = `geography`.`msa_id`
LEFT OUTER JOIN `locations` states_locations 
ON              `states_locations`.`id` = `geography`.`state_id`
AND             `states_locations`.`type` = 'State'
WHERE           `states_locations`.`id` = '1'
AND             `locations`.`type` = 'Msa' 

I assume this means that Rails is loading 1,000s of records into memory AND THEN, in Ruby, cutting down the results to the distinct set of Msas (in this case); seems a bit inefficient. Furthermore, the following subsequent calls return varying results:

>> msas.first.states.size    # incorrect count
=> 192
>> msas.first.states.count   # correct count
=> 1 
>> msas.first.states         # incorrect number of State objects
=> [#<State id: 1, ... >, ..., #<State id: 1, ... >]
>> msas.first.reload.states
=> [#<State id: 1, ... >]    # correct number of State objects

My questions are:

  1. Why isn't Rails using DISTINCT in the query it produces from the find call? I'm guessing it's because I've asked it to :include => :states. Should I use :joins instead?
  2. Why is Rails returning non-unique results when calling msas.first.states? Shouldn't the association having a :uniq => true enforce uniqueness on the results?
  3. Why do I need to use the table alias that Rails uses for the states "version" of the locations table, i.e., :conditions => { :states_locations => { :id => 1 }}? Rails doesn't seem to understand :include => :states, :conditions => { :states => { :id => 1 }}. Is there a way to deterministically predict the table alias?

Any insights would be much appreciated.

Thanks in advance, Jason

+1  A: 

You got a lot of questions there, let me see if this will help ...

You are correct that rails will trigger a sql call to get all results, and then active_record should filter out the unique records.

If you want to avoid that you can do the following:

has_many :states, :through => :geographies, :select => "DISTINCT states.*"

This post has interesting analysis

Also with your line:

msas = Msa.find(:all, :include=>"states", :conditions=>{"states_locations"=>{"id"=>"1"}})

It is not returning unique results because you are not leveraging the relationships that you set up. You probably want to do something like:

@msas = State.find(state_id).msas

Good luck

Jonathan
Hi Jonathan, thanks for the response. I tried using the :select option, however, I found that :select is ignored when using :include. Also, to be clear, the second line of code you referenced *is* returning unique results *in Ruby* just not in SQL. I'll take a look at Josh's post and see if it provides any further insight. Thanks again.
Jason
Josh's post confirms that Rails does indeed use Ruby, not SQL, to "unique-ify" the results in a has_many :through => xxx, :uniq => true association. However, it does not provide insight into why msas.first.states returns non-unique results. One would think that since Rails is smart enough to unique-ify the original result set, albeit in Ruby, it would also be smart enough to unique-ify the associations of objects in the result set as well. Perhaps I need to dive into the associations code in Rails.
Jason