views:

70

answers:

2

Hello, I'm trying to perform a daily operation on a larger than normal dataset (2m+ records). However, Rails seems to take a very long time performing operations on such a dataset. Operations like

Dataset.all.each do |data|
  ...
end

take a very long time to complete (I assume this is because it can't fit all the items into memory at once, right?).

Does anyone have any strategies on how I could handle this situation? I know SQL would probably speed up the process, but I'm looking to use the Rails environment as I can do many more complicated things to the data than I can with just SQL statements.

+1  A: 

When processing a large set of rows, a database is very fast and efficient, it what they were designed for. I would recommend attempting to do all this processing in SQL if you want max performance. If you prefer to use Rails, or it is impossible to do everything you want in SQL, you might attempt to do some pre-processing in SQL and the remainder in Rails. Short of that, 2m+ rows is a lot to loop over, even if each only takes a fraction of a second it add up to a long time.

KM
+3  A: 

You want to use ActiveRecord's find_each for this.

Dataset.find_each do |data|
  ...
end
Corey
Dataset.find_each(:batch_size => 5000) { process ... } (default batch size 1000) fetches the records in slices and processes them so you don't load up the whole thing in memory at once http://guides.rubyonrails.org/active_record_querying.html#retrieving-multiple-objects-in-batches
clyfe
Ah great, I had never heard of this method. Thanks!
japancheese