views:

501

answers:

3

I'm using FasterCSV to import an uploaded file to a model, and it's working great for small files. However when I try to import a large dataset (21,000 lines) it takes ages and I get browser timeouts on the live server.

This is my current working code:

  logcount=0
  Attendee.transaction do
    FCSV.new(file, :headers => true).each do |row|
      row[1] = Date.strptime(row[1], '%m/%d/%Y')
      record = @event.attendees.new(:union_id => row[0], :dob => row[1], :gender => row[2])
      if record.save
        logcount += 1
      end
    end
  end

I'd love to use a background process, but the user needs to see how many lines were imported before they can move to the next step of the system.

So, I was thinking that I should use action chunking and only read a smaller number of lines, set a counter, then update the view with some kind of progress, then run the method again using the previous counter as start point.

I can't seem to see how to get FasterCSV to read only a set number of lines, and also set an offset for the start point.

Does anyone know how to do this? Or is there a better way to handle this?

A: 

I'd rather create a prepared query, load a line from the file and execute the prepared query. Without any use of the model, should be faster.

Simon
Could you give me an example of what you mean? And do you think it'd be fast enough not to need to send updates to the browser?
Les
Not with 21000 records to import.
EmFi
A: 

If you have the database why not import it through a Rake Task? Are your users going to be importing such large databases?

If your users are going to be importing such large database a task won't do.

FCSV.new can take any options IO.open can. You can use that to seek to a particular byte. Unfortunately FCSV doesn't make it easy to stop or access the underlying IO object, to find out where you stopped. Resuming in the middle of a file also complicates the use of a header row.

Really, I think the optimal solution is to outsource your CSV import to a drb, that periodically reports it's progress in a way the controller action can pick up on. Then call that controller action every so often with some AJAX running on the client.

I've had success with BackgroundDRb in the past. It's installation and use is a little too detailed for me to reproduce here. There are other plugins and gems available with a bit of googling.

DRb Caveat Most DRb solutions require an additional daemon process running on your server. some webhosts forbid this on more basic plans. Check your TOS

EmFi
A: 

Have you tried to use AR Extensions for bulk import? You get impressive performance improvements when you are inserting 1000's of rows to DB. Visit their website for more details.

KandadaBoggu