views:

922

answers:

4

I'm trying to build a rake utility that will update my database every so often.

This is the code I have so far:

namespace :utils do

  # utils:update_ip
  # Downloads the file frim <url> to the temp folder then unzips it in <file_path>
  # Then updates the database.

  desc "Update ip-to-country database"
  task :update_ip => :environment do

    require 'open-uri'
    require 'zip/zipfilesystem'
    require 'csv'

    file_name = "ip-to-country.csv"
    file_path = "#{RAILS_ROOT}/db/" + file_name
    url = 'http://ip-to-country.webhosting.info/downloads/ip-to-country.csv.zip'


    #check last time we updated the database.
    mod_time = ''
    mod_time = File.new(file_path).mtime.httpdate    if File.exists? file_path

    begin
      puts 'Downloading update...'
      #send conditional GET to server
      zipped_file = open(url, {'If-Modified-Since' => mod_time})
    rescue OpenURI::HTTPError => the_error
      if the_error.io.status[0] == '304'
        puts 'Nothing to update.'
      else
        puts 'HTTPError: ' + the_error.message
      end
    else # file was downloaded without error.

      Rails.logger.info 'ip-to-coutry: Remote database was last updated: ' + zipped_file.meta['last-modified']
      delay = Time.now - zipped_file.last_modified
      Rails.logger.info "ip-to-country: Database was outdated for: #{delay} seconds (#{delay / 60 / 60 / 24 } days)"

      puts 'Unzipping...'
      File.delete(file_path) if File.exists? file_path
      Zip::ZipFile.open(zipped_file.path) do |zipfile|
        zipfile.extract(file_name, file_path)
      end

      Iptocs.delete_all

      puts "Importing new database..."


      # TODO: way, way too heavy find a better solution.


      CSV.open(file_path, 'r') do |row|
        ip = Iptocs.new(  :ip_from        => row.shift,
                        :ip_to          => row.shift,
                        :country_code2  => row.shift,
                        :country_code3  => row.shift,
                        :country_name   => row.shift)
        ip.save
      end #CSV
      puts "Complete."

    end #begin-resuce
  end #task
end #namespace

The problem I'm having is that this takes a few minutes to enter the 100 thousand plus entries. I'd like to find a more efficient way of updating my database. Ideally this will remain independent of the database type, but if not my production server will be running on MySQL.

Thank you for any insight.

A: 

You could generate a text file with all INSERTs you need and then execute:

mysql -u user -p db_name < mytextfile.txt

Not sure if this will be any faster but worth a try...

Zepplock
Rails itself uses the SQL insert statement. -- See your rails log. So this method would not be a speed improvement.
Larry K
Of course Rails does INSERTS, how else would it add records to the database? But in his original post author is using "save" method which has more overhead than just a simple insert. I'm sure it involves doing commit on each insert, doing model validations, etc
Zepplock
+2  A: 

Use the database level utilities for high speed Luke!

Unfortunately, they're db specific. But they are fast For mysql, see http://dev.mysql.com/doc/refman/5.1/en/load-data.html

Larry K
A: 

As Larry says, use your DB-specific import utilities if the file comes in the format you want. However, if you need to manipulate the data before inserting, you can generate a single INSERT query with data for many rows, which is faster than using a separate query for each row (as ActiveRecord will do). For example:

INSERT INTO iptocs (ip_from, ip_to, country_code) VALUES
  ('xxx', 'xxx', 'xxx'),
  ('yyy', 'yyy', 'yyy'),
  ...;
Alex Reisner
+2  A: 

Have you tried to use AR Extensions for bulk import? You get impressive performance improvements when you are inserting 1000's of rows to DB. Visit their website for more details.

Refer to these examples for more information

Usage Example 1

Usage Example 2

Usage Example 3

KandadaBoggu
This is exactly what I was looking for, thank you.
codr
The gem supports import from CSV. This eliminates the `ActiveRecord` instantiation and validation costs. Refer to this article for more details. http://www.rubyinside.com/advent2006/17-extendingar.html
KandadaBoggu
Helped me too - thanks!
ambertch