ansaurus

Question

Validating a legacy table with ActiveRecord

Answer 1

+2 A:

find_each is using find_in_batches, which fetches 1000 rows at a time by default. You could try playing with the batch_size option. The way you have it above seems pretty optimal; it's fetching from the database in batches and iterating over each one, which you need to do. I would monitor your RAM to see if the batch size is optimal, and you could also try using Ruby 1.9.1 to speed things up if you're currently using 1.8.*.

http://api.rubyonrails.org/classes/ActiveRecord/Batches/ClassMethods.html#M001846

zgchurch 2009-09-29 14:01:35

Answer 2

A:

I like zgchurch's response as a starting point.

What I would add is that threading is definitely not going to help here, especially because Ruby uses green threads (at least in 1.8.x), so there is no opportunity to utilize multiple processors anyway. Even if that weren't the case it's very likely that this operation is IO-heavy enough that you would get IO contention eating into any multi-core benefits.

Now if you really want to speed this up you should take a look at the actual validations and figure out a more efficient way to achieve them. Just loading all the rows and instantiating an ActiveRecord object is going to tend to dominate the performance in most validation situations. You may be spending 90-99.99% of your time just loading and unloading the data from memory.

In these types of situations I tend to go towards raw SQL. You can do things like validating foreign key integrity tens of thousands of times faster than raw ActiveRecord validation callbacks. Of course the viability of this approach depends on the actual ins and outs of your validations. Even if you need something a little richer than SQL to define validity, you could still probably get a 10-100x speed increase just be loading the minimal data with a thinner SQL interface and examining the data directly. If that's the case Perl or Python might be a better choice for raw performance.

dasil003 2009-10-01 04:10:05

good points, i have been reluctant to try and duplicate the validations in SQL, but you are probably right in that it would provide the best performance

Peer Allan 2009-10-01 13:11:43

ansaurus

tags:

views:

answers:

Validating a legacy table with ActiveRecord

related questions