If it would have been only words without the {"over the"=>"under the"}
then I think something like this would be faster than scanning the string over and over again like most of the solutions here do.
First I convert the array to a pure Hash
h=Hash.new
@dictionary.each {|ft| h[ft[:from]]=ft[:to]}
=> {"quick"=>"lazy", "over the"=>"under the", "jumps"=>"flies"}
then I scan the string word by word
@text.split(/ /).each{|w| h[w] || w}.join(" ")
=> "lazy brown fox flies over the lazy dog"
Also it doesn't suffer from the multiple substitution problem.
h["brown"]="quick"
=> {"brown"=>"quick", "quick"=>"lazy", "over the"=>"under the", "jumps"=>"flies"}
@text.split(/ /).each{|w| h[w] || w}.join(" ")
=> "lazy quick fox flies over the lazy dog"
I did some benchmarks and I had to add a lot more replacement pairs than I thought before the solution above got faster than gsub!
.
require 'benchmark'
@dictionary = [{:to=>"lazy", :from=>"quick"}, {:to=>"flies", :from=>"jumps"}, {:from => "over the", :to => "under the"}]
@text = "quick brown fox jumps over the lazy dog" * 10000
Benchmark.bm do |benchmark|
benchmark.report do
h=Hash.new
@dictionary.each {|ft| h[ft[:from]]=ft[:to]}
[email protected](/ /).each{|w| h[w] || w}.join(' ')
end
benchmark.report do
@dictionary.each { |pair| @text.gsub!(/#{pair[:from]}/, pair[:to]) }
end
@dictionary+=[{:to=>"black", :from=>"brown"}, {:to=>"ox", :from=>"fox"}, {:to=>"hazy", :from=>"lazy"}, {:to=>"frog", :from=>"dog"}]
@dictionary=@dictionary*15
benchmark.report do
h=Hash.new
@dictionary.each {|ft| h[ft[:from]]=ft[:to]}
[email protected](/ /).each{|w| h[w] || w}.join(' ')
end
benchmark.report do
@dictionary.each { |pair| @text.gsub!(/#{pair[:from]}/, pair[:to]) }
end
end
The results:
user system total real
0.890000 0.060000 0.950000 ( 0.962106)
0.200000 0.020000 0.220000 ( 0.217235)
0.980000 0.060000 1.040000 ( 1.042783)
0.980000 0.030000 1.010000 ( 1.011380)
The gsub!
solution was 4.5 times faster with only three replacement pairs.
At 105 replacement pairs the split
solution finally is as fast, it actually only got 10% slower with 105 replacement pairs than for three. The gsub!
got five times slower.