ansaurus

Question

Rails design doubt: Should/could I load the whole dictionary/table into memory?

Answer 1

A:

In production mode, Rails will not reload classes between requests. You can keep something in memory easily by putting it into a class variable.

You could do something like:

class Dictionary < ActiveRecord::Base
  @@cached = nil
  mattr_accessor :cached

  def self.cache_dict!
    @@cached = Dictionary.all
  end
end

And then in production.rb:

Dictionary.cache_dict!

For your specific questions:

Possibly write the part that's inefficient in C or Java or a faster language
Nope, sorry. Maybe you could do a MapReduce algorithm to distribute the load across servers.
See above.

Eli 2010-01-24 19:07:56

Thank you, I am thinking about a mixed solution between a marshaled file and caching part of it as you point out.

fjs6 2010-01-25 15:36:38

Answer 2

A:

If you use something like cache_fu, you can then leverage something like memcache without doing any work yourself. If you are trying to bring 1 MM rows into memory, being able to leverage the distributed nature of memcache will probably be useful.

Rob Di Marco 2010-01-24 19:50:39

Thanks. I´ll look into cache_fu

fjs6 2010-01-25 18:04:53

Answer 3

A:

This isn't so much a specific answer to one of your questions as a process recommendation. If you're having (or anticipating) performance issues, you should be using a profiler from the get-go.

Check out this tutorial: How to Profile Your Rails Application.

My experience on a number of platforms (ANSI C, C#, Ruby) is that performance problems are very hard to deal with in advance; rather, you're better off implementing something that looks like it might be performant then load-testing it through a profiler.

Then, once you know where your time is being spent, you can expend some effort on optimisation.

If I had to take a guess, I'd say the regex work you'll be performing will be as much of a performance bottleneck as any ActiveRecord work. But without verifying that with a profiler, that guess is of little value.

Duncan Bayne 2010-01-24 22:42:13

Great advice. Thank you!

fjs6 2010-01-25 15:35:19

Answer 4

+1 A:

I don't think your web hoster will be happy with a solution like this. This script

dict = {}
(0..1000_000).each do | num |
    dict[/#{num}/] = "#{num}_subst"
end

consumes a gigabyte of RAM on my MBP for storing the hash table. Another approach will be to store your substitutions marshaled in memcached so that you could (at least) store them across machines.

require 'rubygems'
require 'memcached'
@table = Memcached.new("localhost:11211")

retained_keys = (0..1000_000).each do | num |
  stored_blob = Marshal.dump([/#{num}/, "#{num}_subst"])
  @table.set("p#{num}", stored_blob)
end

You will have to worry about keeping the keys "hot" since memcached will expire them if they are not needed.

The best approach however, for your case, would be very simple - write your substitutions to a file (one line per substitution) and make a stream-filter that reads the file line by line, and replaces from this file. You can also parallelize that by mapping work on this, say, per letter of substitution and replacing markers.

But this should get you started:

  require "base64"

  File.open("./dict.marshal", "wb") do | file |
    (0..1000_000).each do | num |
      stored_blob = Base64.encode64(Marshal.dump([/#{num}/, "#{num}_subst"]))
      file.puts(stored_blob)
    end
  end

  puts "Table populated (should be a 35 meg file), now let's run substitutions"

  File.open("./dict.marshal", "r") do | f |
    until f.eof?
      pattern, replacement = Marshal.load(Base64.decode64(f.gets))
    end
  end

  puts "All replacements out"

To populate the file AND load each substitution, this takes me:

 real    0m21.262s
 user    0m19.100s
 sys     0m0.502s

To just load the regexp and the string from file (all the million, piece by piece)

 real    0m7.855s
 user    0m7.645s
 sys     0m0.105s

So this is 7 seconds IO overhead, but you don't lose any memory (and there is huge room for improvement) - the RSIZE is about 3 megs. You should easily be able to make it go faster if you do IO in bulk, or make one file for 10-50 substitutions and load them as a whole. Put the files on an SSD or a RAID and you got a winner, but you get to keep your RAM.

Julik 2010-01-25 00:07:19

Thank you very much. It is a great idea that I have already coded in my app with good results.

fjs6 2010-01-25 15:34:27

ansaurus

tags:

views:

answers:

Rails design doubt: Should/could I load the whole dictionary/table into memory?

related questions