views:

93

answers:

4

I have an array of hashes, each hash has two keys : "from" and "to"

@dictionary = [{:to=>"lazy", :from=>"quick"}, {:to=>"flies", :from=>"jumps"}, {:from => "over the", :to => "under the"}]

I have some long string

@text = "quick brown fox jumps over the lazy dog"

How can I replace all occurences of "from" sentences to "to" sentences of the dictionary ?

The output should be:

lazy brown fox flies under the lazy dog

What is the most efficient way ?

A: 
@dictionary.inject(@text) {|text, d|
  text.gsub d[:from], d[:to]
}
NV
@dictionary = [{:to=>"crazy bob", :from=>"lazy"}, {:to=>"mad ben", :from=>"crazy bob"}]"quick brown fox jumps over the mad ben dog"As You can see, the same text is substituted twice, does anybody knows a one run solution (not overriding previous replacements) ?Good work @NV, thanks
astropanic
What output do you expect? "quick brown fox jumps over the mad ben dog" or not?
NV
I mean in my previous comment, the first substitution from lazy to crazy bob is later replaced by from crazy bob to mad ben. It should'nt replace existent substitutions.
astropanic
A: 
@enhaced_dictionary = @dictionary.inject({}) {|res, e| res[e[:from]] = e[:to]  }
@compiled = @text.split(/\s/).map do |e|
  @enhaced_dictionary[e] ?  @enhaced_dictionary[e] : e
end.join(' ')
clyfe
`IndexError: string not matched`
Jonas Elfström
+3  A: 
@dictionary.each do |pair|
  @text.gsub!(/#{pair[:from]}/, pair[:to])
end

Or if you'd prefer it on a single line:

@dictionary.each { |pair| @text.gsub!(/#{pair[:from]}/, pair[:to]) }

It's the exact same code, just using { } instead of do end for the block (which tends to be the general Ruby practice).

mlambie
"all occurences", so instead of `sub` should be `gsub`.
NV
Good point, thanks NV. Updated now.
mlambie
+1  A: 

If it would have been only words without the {"over the"=>"under the"} then I think something like this would be faster than scanning the string over and over again like most of the solutions here do.

First I convert the array to a pure Hash

h=Hash.new
@dictionary.each {|ft| h[ft[:from]]=ft[:to]}
=> {"quick"=>"lazy", "over the"=>"under the", "jumps"=>"flies"}

then I scan the string word by word

@text.split(/ /).each{|w| h[w] || w}.join(" ")
=> "lazy brown fox flies over the lazy dog"

Also it doesn't suffer from the multiple substitution problem.

h["brown"]="quick"
=> {"brown"=>"quick", "quick"=>"lazy", "over the"=>"under the", "jumps"=>"flies"}
@text.split(/ /).each{|w| h[w] || w}.join(" ")
=> "lazy quick fox flies over the lazy dog"

I did some benchmarks and I had to add a lot more replacement pairs than I thought before the solution above got faster than gsub!.

require 'benchmark'

@dictionary = [{:to=>"lazy", :from=>"quick"}, {:to=>"flies", :from=>"jumps"}, {:from => "over the", :to => "under the"}]
@text = "quick brown fox jumps over the lazy dog" * 10000
Benchmark.bm do |benchmark|
  benchmark.report do
    h=Hash.new
    @dictionary.each {|ft| h[ft[:from]]=ft[:to]}
    [email protected](/ /).each{|w| h[w] || w}.join(' ')
  end
  benchmark.report do
    @dictionary.each { |pair| @text.gsub!(/#{pair[:from]}/, pair[:to]) }
  end

  @dictionary+=[{:to=>"black", :from=>"brown"}, {:to=>"ox", :from=>"fox"}, {:to=>"hazy", :from=>"lazy"}, {:to=>"frog", :from=>"dog"}]
  @dictionary=@dictionary*15

  benchmark.report do
    h=Hash.new
    @dictionary.each {|ft| h[ft[:from]]=ft[:to]}
    [email protected](/ /).each{|w| h[w] || w}.join(' ')
  end
  benchmark.report do
    @dictionary.each { |pair| @text.gsub!(/#{pair[:from]}/, pair[:to]) }
  end
end

The results:

      user     system      total        real
  0.890000   0.060000   0.950000 (  0.962106)
  0.200000   0.020000   0.220000 (  0.217235)
  0.980000   0.060000   1.040000 (  1.042783)
  0.980000   0.030000   1.010000 (  1.011380)

The gsub! solution was 4.5 times faster with only three replacement pairs. At 105 replacement pairs the split solution finally is as fast, it actually only got 10% slower with 105 replacement pairs than for three. The gsub! got five times slower.

Jonas Elfström
I don't think `split` and `join` is a good idea. What if `@text = " lazy quick fox \n flies over \t the lazy dog"` ?
NV
`@text.split(/ /).each{|w| h[w] || w}.join(" ")``=> " lazy quick fox \n flies over \t the lazy dog"`
Jonas Elfström