views:

95

answers:

2

Hello,

I have to create around 100k records. Those records are in the csv file and are being loaded using create_fixtures function. It's slow on my development machine but it completes. The problem starts on production where we have memory limits per process which leads to killing the rake process. I think it's because the create_fixtures is importing all the data into memory. Does anyone know how to force it to import smaller chunks (before i cut one big csv into few smaller ones).

A: 

How are you loading/parsing the CSV? I think I'd use Ruby's File utils to open and read and parse every line myself.

Ariejan
+1  A: 

Don't do it!

create_fixtures is designed for loading test data, which should be only as big as needed to exercise a feature. It's not intended for loading thousands of records onto a production (or any other kind of) database. If it's a one-off then maybe OK, but as a regular thing it would make me very nervous.

If your data is simple enough, by which I mean a simple String#split would work, then that should probably be your approach, something like

File.foreach(csv_file_path) do |line|
  fields = line.split(/,/)
  # create records from the array of fields
end

Otherwise (i.e. you may have string values with quotes or commas, missing field values or multiple record formats, that sort of thing) you should probably look at the CSV library, which is already part of the Ruby 1.8.6 install, or better yet, look at the FasterCSV Gem, which replaces CSV in 1.9 onwards.

UPDATE: Handily, Ryan Bates just posted a screencast on the vexatious topic of seed data...

Mike Woodhouse
While i understand that it's bad practice to load fixtures in production (no validations for example), for some data (like countries/cities/etc...) it's a necessity. I found nice dicussion here: http://railspikes.com/2008/2/1/loading-seed-data(looking not for fixture memory problem but for how people handle seeding production.)
j t
Maybe I'm reading a different part. I see this "I don’t like fixtures because they don’t validate data". The comments to that post mention ar-extensions, which is useful if DB bulk-write speed is an issue. I'm not quibbling with the need for seed data (is that all you're trying to do - the question isn't clear), I just think you may be better off if you give up on trying to use fixture loading for something it wasn't designed for.
Mike Woodhouse