views:

43

answers:

3

I have a Ruby program that loads up two very large yaml files, so I can get some speed-up by taking advantage of the multiple cores by forking off some processes. I've tried looking, but I'm having trouble figuring how, or even if, I can share variables in different processes.

The following code is what I currently have:

@proteins = ""
@decoyProteins = "" 

fork do
  @proteins = YAML.load_file(database)
  exit
end

fork do
  @decoyProteins = YAML.load_file(database)
  exit
end

p @proteins["LVDK"]

P displays nil though because of the fork.

So is it possible to have the forked processes share the variables? And if so, how?

+1  A: 

One problem is you need to use Process.wait to wait for your forked processes to complete. The other is that you can't do interprocess communication through variables. To see this:

@one = nil
@two = nil
@hash = {}
pidA = fork do
    sleep 1
    @one = 1
    @hash[:one] = 1
    p [:one, @one, :hash, @hash] #=> [ :one, 1, :hash, { :one => 1 } ]
end
pidB = fork do
    sleep 2
    @two = 2
    @hash[:two] = 2
    p [:two, @two, :hash, @hash] #=> [ :two, 2, :hash, { :two => 2 } ]
end
Process.wait(pidB)
Process.wait(pidA)
p [:one, @one, :two, @two, :hash, @hash] #=> [ :one, nil, :two, nil, :hash, {} ]

One way to do interprocess communication is using a pipe (IO::pipe). Open it before you fork, then have each side of the fork close one end of the pipe.

From ri IO::pipe:

    rd, wr = IO.pipe

    if fork
      wr.close
      puts "Parent got: <#{rd.read}>"
      rd.close
      Process.wait
    else
      rd.close
      puts "Sending message to parent"
      wr.write "Hi Dad"
      wr.close
    end

 _produces:_

    Sending message to parent
    Parent got: <Hi Dad>

If you want to share variables, use threads:

@one = nil
@two = nil
@hash = {}
threadA = Thread.fork do
    sleep 1
    @one = 1
    @hash[:one] = 1
    p [:one, @one, :hash, @hash] #=> [ :one, 1, :hash, { :one => 1 } ] # (usually)
end
threadB = Thread.fork do
    sleep 2
    @two = 2
    @hash[:two] = 2
    p [:two, @two, :hash, @hash] #=> [ :two, 2, :hash, { :one => 1, :two => 2 } ] # (usually)
end
threadA.join
threadB.join
p [:one, @one, :two, @two, :hash, @hash] #=> [ :one, 1, :two, 2, :hash, { :one => 1, :two => 2 } ]

However, I'm not sure if threading will get you any gain when you're IO bound.

rampion
Where does the `:hash` symbol go, when you write `p [:one, @one, :hash, @hash] #=> [ :one, 1, { :one => 1 } ]`?
Jeriko
... invisible due to poor transcription?:)fixed it, thx
rampion
A: 

It is possible to share variables between processes; DRuby is probably the lowest barrier-to-entry way to do it.

regularfry
doc: http://www.ensta.fr/~diam/ruby/online/ruby-doc-stdlib/libdoc/drb/rdoc/classes/DRb.html
rampion
A: 

You probably want to use a thread instead of a fork if you want to share data.

http://ruby-doc.org/docs/ProgrammingRuby/html/tut_threads.html

Oh, and if you really want to take advantage of threads you'll want to use JRuby. In [c]Ruby 1.9 you may alway want to take a look at fibers. I haven't looked at them though, I don't know if it is a solution for you.

Trey
Threads aren't what I want because it doesn't take advantage of the multiple cores. I actually tried threads already, and it was actually slower.
Jesse J