I have a Ruby hash that reaches approximately 10 megabytes if written to a file using Marshal.dump. After gzip compression it is approximately 500 kilobytes.
Iterating through and altering this hash is very fast in ruby (fractions of a millisecond). Even copying it is extremely fast.
The problem is that I need to share the data in this hash between Ruby on Rails processes. In order to do this using the Rails cache (file_store or memcached) I need to Marshal.dump the file first, however this incurs a 1000 millisecond delay when serializing the file and a 400 millisecond delay when serializing it.
Ideally I would want to be able to save and load this hash from each process in under 100 milliseconds.
One idea is to spawn a new Ruby process to hold this hash that provides an API to the other processes to modify or process the data within it, but I want to avoid doing this unless I'm certain that there are no other ways to share this object quickly.
Is there a way I can more directly share this hash between processes without needing to serialize or deserialize it?
Here is the code I'm using to generate a hash similar to the one I'm working with:
@a = []
0.upto(500) do |r|
@a[r] = []
0.upto(10_000) do |c|
if rand(10) == 0
@a[r][c] = 1 # 10% chance of being 1
else
@a[r][c] = 0
end
end
end
@c = Marshal.dump(@a) # 1000 milliseconds
Marshal.load(@c) # 400 milliseconds
Update:
Since my original question did not receive many responses, I'm assuming there's no solution as easy as I would have hoped.
Presently I'm considering two options:
- Create a Sinatra application to store this hash with an API to modify/access it.
- Create a C application to do the same as #1, but a lot faster.
The scope of my problem has increased such that the hash may be larger than my original example. So #2 may be necessary. But I have no idea where to start in terms of writing a C application that exposes an appropriate API.
A good walkthrough through how best to implement #1 or #2 may receive best answer credit.
Update 2
I ended up implementing this as a separate application written in Ruby 1.9 that has a DRb interface to communicate with application instances. I use the Daemons gem to spawn DRb instances when the web server starts up. On start up the DRb application loads in the necessary data from the database, and then it communicates with the client to return results and to stay up to date. It's running quite well in production now. Thanks for the help!