tags:

views:

176

answers:

3

ruby: What is the most optimized expression to evaluate the same as result as with

phrase.split(delimiter).collect {|p| p.lstrip.rstrip }
A: 

I only see an optimisation in ommiting

p.lstrip.rstrip

with

p.strip!
Aurril
Do you mean `p.strip`?
Wayne Conrad
p.strip! is valid as well as p.strip.Just p.strip! is a tick faster as it does not need to make a copy of p
Aurril
strip! returns nil if string is not altered. Will not fit the logic error-free without a conditional check.
ramonrails
@ramonrails: You are right. In that case phrase.split(delimiter).each {|p| p.strip! } should be used.
Aurril
+1  A: 

You could try a regular expression:

phrase.strip.split(/\s*#{delimiter}\s*/)
Mark Byers
don't forget to `strip` the `phrase` first to get rid of leading/trailing whitespace.
glenn jackman
@glenn: Good point!
Mark Byers
This solution does not strip the spaces around each element of split result.
ramonrails
+7  A: 

Optimised for clarity I would prefer the following:

phrase.split(delimiter).collect(&:strip)

But I presume you want to optimise for speed. I don't know why others are speculating. The only way to find out what is faster is to benchmark your code.

Make sure you adjust the benchmark parameters - this is just an example.

require "benchmark"

# Adjust parameters below for your typical use case.
n = 10_000
input = " This is - an example. - A relatively long string " +
  "- delimited by dashes. - Adjust if necessary " * 100
delimiter = "-"

Benchmark.bmbm do |bench|
  bench.report "collect { |s| s.lstrip.rstrip }" do
    # Your example.
    n.times { input.split(delimiter).collect { |s| s.lstrip.rstrip } }
  end

  bench.report "collect { |s| s.strip }" do
    # Use .strip instead of .lstrip.rstrip.
    n.times { input.split(delimiter).collect { |s| s.strip } }
  end

  bench.report "collect { |s| s.strip! }" do
    # Use .strip! to modifiy strings in-place.
    n.times { input.split(delimiter).collect { |s| s.strip! } }
  end

  bench.report "collect(&:strip!)" do
    # Slow block creation (&:strip! syntax).
    n.times { input.split(delimiter).collect(&:strip!) }
  end

  bench.report "split(/\\s*\#{delim}\\s*/) (static)" do
    # Use static regex -- only possible if delimiter doesn't change.
    re = Regexp.new("\s*#{delimiter}\s*")
    n.times { input.split(re) }
  end

  bench.report "split(/\\s*\#{delim}\\s*/) (dynamic)" do
    # Use dynamic regex, slower to create every time?
    n.times { input.split(Regexp.new("\s*#{delimiter}\s*")) }
  end
end

Results on my laptop with the parameters listed above:

                                      user     system      total        real
collect { |s| s.lstrip.rstrip }   7.970000   0.050000   8.020000 (  8.246598)
collect { |s| s.strip }           6.350000   0.050000   6.400000 (  6.837892)
collect { |s| s.strip! }          5.110000   0.020000   5.130000 (  5.148050)
collect(&:strip!)                 5.700000   0.030000   5.730000 (  6.010845)
split(/\s*#{delim}\s*/) (static)  6.890000   0.030000   6.920000 (  7.071058)
split(/\s*#{delim}\s*/) (dynamic) 6.900000   0.020000   6.920000 (  6.983142)

From the above I might conclude:

  • Using strip instead of .lstrip.rstrip is faster.
  • Preferring &:strip! over { |s| s.strip! } comes with a performance cost.
  • Simple regex patterns are nearly as fast as using split + strip.

Things that I can think of that may influence the result:

  • The length of the delimiter (and whether or not it is whitespace).
  • The length of the strings that you want to split.
  • The length of the splittable chunks in the string.

But don't take my word for it. Measure it!

molf
+1 for good example
Aurril
This was the best answer I received. Backed with facts. And of course, I learned some benchmarking too! Thanks.
ramonrails