views:

78

answers:

1

I am running into some Rails 2.3.5 ActiveRecord behavior I do not understand. It appears that an object can have its association ids updated in inconsistent ways.

This is best explained with an example:

Create a Post model with the string attribute 'title' and a Comment model with the string attribute 'content'.

Here are the associations:

class Post < ActiveRecord::Base
  has_many :comments
end

class Comment < ActiveRecord::Base
  belongs_to :post
end

Scenario #1: In the following code I create one Post with an associated Comment, create a second Post by find'ing the first, add a second Comment to the first Post and discover that the second Post has the second Comment associated to it without an explicit assignment.

post1 = Post.new
post1 = Post.new(:title => 'Post 1')
comment1 = Comment.new(:content => 'content 1')
post1.comments << comment1
post1.save
# Create a second Post object by find'ing the first
post2 = Post.find_by_title('Post 1')
# Add a new Comment to the first Post object
comment2 = Comment.new(:content => 'content 2')
post1.comments << comment2 
# Note that both Comments are associated with both Post objects even
# though I never explicitly associated it with post2.
post1.comment_ids # => [12, 13]
post2.comment_ids # => [12, 13]

Scenario #2: Run the above commands again but this time insert one extra command that, on the face of it, should not affect the results. The extra command is post2.comments which occurs after creating comment2 and before adding comment2 to post1.

post1 = Post.new
post1 = Post.new(:title => 'Post 1A')
comment1 = Comment.new(:content => 'content 1A')
post1.comments << comment1
post1.save
# Create a second Post object by find'ing the first
post2 = Post.find_by_title('Post 1A')
# Add a new Comment to the first Post object
comment2 = Comment.new(:content => 'content 2A')
post2.comments # !! THIS IS THE EXTRA COMMAND !!
post1.comments << comment2 
# Note that both Comments are associated with both Post objects even
# though I never explicitly associated it with post2.
post1.comment_ids # => [14, 15]
post2.comment_ids # => [14]

Note that there is only one comment associated with post2 in this scenario whereas in Scenario 1 there were two.

The Big Question: Why would running post2.comments before adding the new Comment to post1 make any difference to which Comments were associated with post2?

+1  A: 

This has to do with the way that Active Record caches requests and the way that has_many associations are handled.

Unless the association is eagerloaded with an :include option during the find. Rails will not populate the association for the found records until needed. When the association is needed some memoization is done to cut down on the number of SQL queries executed.

Stepping through the code in the question:

post1 = Post.new(:title => 'Post 1')
comment1 = Comment.new(:content => 'content 1')
post1.comments << comment1  # updates post1's internal comments cache
post1.save 

# Create a second Post object by find'ing the first
post2 = Post.find_by_title('Post 1') 

# Add a new Comment to the first Post object
comment2 = Comment.new(:content => 'content 2')
post1.comments << comment2   # updates post1's internal comments cache

# Note that both Comments are associated with both Post objects even
# though I never explicitly associated it with post2.
post1.comment_ids # => [12, 13]

# this is the first time post2.comments are loaded. 
# SELECT comments.* FROM comments JOIN comments.post_id = posts.id WHERE posts.id = #{post2.id}
post2.comment_ids # => [12, 13]

Scenario 2:

post1 = Post.new(:title => 'Post 1A')
comment1 = Comment.new(:content => 'content 1A')
post1.comments << comment1
post1.save

# Create a second Post object by find'ing the first
post2 = Post.find_by_title('Post 1A')

# Add a new Comment to the first Post object
comment2 = Comment.new(:content => 'content 2A')

# first time post2.comments are loaded. 
# SELECT comments.* FROM comments JOIN comments.post_id = posts.id WHERE 
#   posts.id = post2.comments #=> Returns one comment (id = 14)
# cached internally.

post1.comments << comment2 
# Note that both Comments are associated with both Post objects even
# though I never explicitly associated it with post2.
post1.comment_ids # => [14, 15]

# post2.comment has already been cached, so the SQL query is not executed again.

post2.comment_ids # => [14]

N.B. post2.comment_ids is internally defined as post2.comments.map(&:id)

P.S. My answer to this question might help you understand why post2 gets updated despite your not touching it.

EmFi
Thanks for the answer. But doesn't this behavior seem wrong? Caching should improve performance, not cause result inconsistency.
rlandster
It's more a concurrency issue. Rails doesn't expect an outside source to change things related to an instance of a model. Outside source in this case meaning any actions that originating from some where other than the specific instance. If strongly believe this is wrong, file a bug report.
EmFi