views:

311

answers:

3

I have a these 3 models:

class User < ActiveRecord::Base
  has_many :permissions, :dependent => :destroy
  has_many :roles, :through => :permissions
end

class Permission < ActiveRecord::Base
  belongs_to :role
  belongs_to :user
end
class Role < ActiveRecord::Base
  has_many :permissions, :dependent => :destroy
  has_many :users, :through => :permissions
end

I want to find a user and it's roles in one sql statement, but I can't seem to achieve this:

The following statement:

user = User.find_by_id(x, :include => :roles)

Gives me the following queries:

  User Load (1.2ms)   SELECT * FROM `users` WHERE (`users`.`id` = 1) LIMIT 1
  Permission Load (0.8ms)   SELECT `permissions`.* FROM `permissions` WHERE (`permissions`.user_id = 1) 
  Role Load (0.8ms)   SELECT * FROM `roles` WHERE (`roles`.`id` IN (2,1))

Not exactly ideal. How do I do this so that it does one sql query with joins and loads the user's roles into memory so saying:

user.roles

doesn't issue a new sql query

A: 

Including a model loads the datas. But makes a second query.
For what you want to do, you should use the :joins parameter.

user = User.find_by_id(x, :joins => :roles)
Damien MATHIEU
but user.roles makes a second query anyway so that doesn't help at all
brad
+2  A: 

As Damien pointed out, if you really want a single query every time you should use join.

But you might not want a single SQL call. Here's why (from here):

Optimized Eager Loading


Let’s take a look at this:

Post.find(:all, :include => [:comments])

Until Rails 2.0 we would see something like the following SQL query in the log:

SELECT `posts`.`id` AS t0_r0, `posts`.`title` AS t0_r1, `posts`.`body` AS t0_r2, `comments`.`id` AS t1_r0, `comments`.`body` AS t1_r1 FROM `posts` LEFT OUTER JOIN `comments` ON comments.post_id = posts.id

But now, in Rails 2.1, the same command will deliver different SQL queries. Actually at least 2, instead of 1. “And how can this be an improvement?” Let’s take a look at the generated SQL queries:

SELECT `posts`.`id`, `posts`.`title`, `posts`.`body` FROM `posts` 

SELECT `comments`.`id`, `comments`.`body` FROM `comments` WHERE (`comments`.post_id IN (130049073,226779025,269986261,921194568,972244995))

The :include keyword for Eager Loading was implemented to tackle the dreaded 1+N problem. This problem happens when you have associations, then you load the parent object and start loading one association at a time, thus the 1+N problem. If your parent object has 100 children, you would run 101 queries, which is not good. One way to try to optimize this is to join everything using an OUTER JOIN clause in the SQL, that way both the parent and children objects are loaded at once in a single query.

Seemed like a good idea and actually still is. But for some situations, the monster outer join becomes slower than many smaller queries. A lot of discussion has been going on and you can take a look at the details at the tickets 9640, 9497, 9560, L109.

The bottom line is: generally it seems better to split a monster join into smaller ones, as you’ve seen in the above example. This avoid the cartesian product overload problem. For the uninitiated, let’s run the outer join version of the query:

mysql> SELECT `posts`.`id` AS t0_r0, `posts`.`title` AS t0_r1, `posts`.`body` AS t0_r2, `comments`.`id` AS t1_r0, `comments`.`body` AS t1_r1 FROM `posts` LEFT OUTER JOIN `comments` ON comments.post_id = posts.id ;

+-----------+-----------------+--------+-----------+---------+
| t0_r0     | t0_r1           | t0_r2  | t1_r0     | t1_r1   |
+-----------+-----------------+--------+-----------+---------+
| 130049073 | Hello RailsConf | MyText |      NULL | NULL    | 
| 226779025 | Hello Brazil    | MyText | 816076421 | MyText5 | 
| 269986261 | Hello World     | MyText |  61594165 | MyText3 | 
| 269986261 | Hello World     | MyText | 734198955 | MyText1 | 
| 269986261 | Hello World     | MyText | 765025994 | MyText4 | 
| 269986261 | Hello World     | MyText | 777406191 | MyText2 | 
| 921194568 | Rails 2.1       | NULL   |      NULL | NULL    | 
| 972244995 | AkitaOnRails    | NULL   |      NULL | NULL    | 
+-----------+-----------------+--------+-----------+---------+
8 rows in set (0.00 sec)

Pay attention to this: do you see lots of duplications in the first 3 columns (t0_r0 up to t0_r2)? Those are the Post model columns, the remaining being each post’s comment columns. Notice that the “Hello World” post was repeated 4 times. That’s what a join does: the parent rows are repeated for each children. That particular post has 4 comments, so it was repeated 4 times.

The problem is that this hits Rails hard, because it will have to deal with several small and short-lived objects. The pain is felt in the Rails side, not that much on the MySQL side. Now, compare that to the smaller queries:

mysql> SELECT `posts`.`id`, `posts`.`title`, `posts`.`body` FROM `posts` ;
+-----------+-----------------+--------+
| id        | title           | body   |
+-----------+-----------------+--------+
| 130049073 | Hello RailsConf | MyText | 
| 226779025 | Hello Brazil    | MyText | 
| 269986261 | Hello World     | MyText | 
| 921194568 | Rails 2.1       | NULL   | 
| 972244995 | AkitaOnRails    | NULL   | 
+-----------+-----------------+--------+
5 rows in set (0.00 sec)

mysql> SELECT `comments`.`id`, `comments`.`body` FROM `comments` WHERE (`comments`.post_id IN (130049073,226779025,269986261,921194568,972244995));
+-----------+---------+
| id        | body    |
+-----------+---------+
|  61594165 | MyText3 | 
| 734198955 | MyText1 | 
| 765025994 | MyText4 | 
| 777406191 | MyText2 | 
| 816076421 | MyText5 | 
+-----------+---------+
5 rows in set (0.00 sec)

Actually I am cheating a little bit, I manually removed the created_at and updated_at fields from the all the above queries in order for you to understand it a little bit clearer. So, there you have it: the posts result set, separated and not duplicated, and the comments result set with the same size as before. The longer and more complex the result set, the more this matters because the more objects Rails would have to deal with. Allocating and deallocating several hundreds or thousands of small duplicated objects is never a good deal.

But this new feature is smart. Let’s say you want something like this:

>> Post.find(:all, :include => [:comments], :conditions => ["comments.created_at > ?", 1.week.ago.to_s(:db)])

In Rails 2.1, it will understand that there is a filtering condition for the ‘comments’ table, so it will not break it down into the small queries, but instead, it will generate the old outer join version, like this:

SELECT `posts`.`id` AS t0_r0, `posts`.`title` AS t0_r1, `posts`.`body` AS t0_r2, `posts`.`created_at` AS t0_r3, `posts`.`updated_at` AS t0_r4, `comments`.`id` AS t1_r0, `comments`.`post_id` AS t1_r1, `comments`.`body` AS t1_r2, `comments`.`created_at` AS t1_r3, `comments`.`updated_at` AS t1_r4 FROM `posts` LEFT OUTER JOIN `comments` ON comments.post_id = posts.id WHERE (comments.created_at > '2008-05-18 18:06:34')

So, nested joins, conditions, and so forth on join tables should still work fine. Overall it should speed up your queries. Some reported that because of more individual queries, MySQL seems to receive a stronger punch CPU-wise. Do you home work and make your stress tests and benchmarks to see what happens.

JRL
Thanks for the in depth explanation. I understand that large joins are expensive, however, my code that's finding this user is in a before filter to get current_user from a sessions, so I'm only ever choosing one user, so there's no massive join. I would think then that a join over an includes would be faster for finding a single user with multiple roles. Any suggestions on how to do that?
brad
You can always do a find_by_sql and put whatever SQL you want.
JRL
+3  A: 

Loading the Roles in a separate SQL query is actually an optimization called "Optimized Eager Loading".

Role Load (0.8ms)   SELECT * FROM `roles` WHERE (`roles`.`id` IN (2,1))

(It is doing this instead of loading each role separately, the N+1 problem.)

The Rails team found it was usually faster to use an IN query with the associations looked up previously instead of doing a big join.

A join will only happen in this query if you add conditions on one of the other tables. Rails will detect this and do the join.

For example:

User.all(:include => :roles, :conditions => "roles.name = 'Admin'")

See the original ticket, this previous Stack Overflow question, and Fabio Akita's blog post about Optimized Eager Loading.

Luke Francl