ansaurus

Question

ruby: optimize => phrase.split(delimiter).collect {|p| p.lstrip.rstrip }

Answer 1

A:

I only see an optimisation in ommiting

p.lstrip.rstrip

with

p.strip!

Aurril 2010-03-02 21:35:24

Do you mean `p.strip`?

Wayne Conrad 2010-03-02 21:53:02

p.strip! is valid as well as p.strip.Just p.strip! is a tick faster as it does not need to make a copy of p

Aurril 2010-03-02 22:26:10

strip! returns nil if string is not altered. Will not fit the logic error-free without a conditional check.

ramonrails 2010-03-08 17:21:51

@ramonrails: You are right. In that case phrase.split(delimiter).each {|p| p.strip! } should be used.

Aurril 2010-03-09 07:48:12

Answer 2

+1 A:

You could try a regular expression:

phrase.strip.split(/\s*#{delimiter}\s*/)

Mark Byers 2010-03-02 21:36:26

don't forget to `strip` the `phrase` first to get rid of leading/trailing whitespace.

glenn jackman 2010-03-03 14:54:51

@glenn: Good point!

Mark Byers 2010-03-03 14:56:04

This solution does not strip the spaces around each element of split result.

ramonrails 2010-03-08 17:20:02

Answer 3

+7 A:

Optimised for clarity I would prefer the following:

phrase.split(delimiter).collect(&:strip)

But I presume you want to optimise for speed. I don't know why others are speculating. The only way to find out what is faster is to benchmark your code.

Make sure you adjust the benchmark parameters - this is just an example.

require "benchmark"

# Adjust parameters below for your typical use case.
n = 10_000
input = " This is - an example. - A relatively long string " +
  "- delimited by dashes. - Adjust if necessary " * 100
delimiter = "-"

Benchmark.bmbm do |bench|
  bench.report "collect { |s| s.lstrip.rstrip }" do
    # Your example.
    n.times { input.split(delimiter).collect { |s| s.lstrip.rstrip } }
  end

  bench.report "collect { |s| s.strip }" do
    # Use .strip instead of .lstrip.rstrip.
    n.times { input.split(delimiter).collect { |s| s.strip } }
  end

  bench.report "collect { |s| s.strip! }" do
    # Use .strip! to modifiy strings in-place.
    n.times { input.split(delimiter).collect { |s| s.strip! } }
  end

  bench.report "collect(&:strip!)" do
    # Slow block creation (&:strip! syntax).
    n.times { input.split(delimiter).collect(&:strip!) }
  end

  bench.report "split(/\\s*\#{delim}\\s*/) (static)" do
    # Use static regex -- only possible if delimiter doesn't change.
    re = Regexp.new("\s*#{delimiter}\s*")
    n.times { input.split(re) }
  end

  bench.report "split(/\\s*\#{delim}\\s*/) (dynamic)" do
    # Use dynamic regex, slower to create every time?
    n.times { input.split(Regexp.new("\s*#{delimiter}\s*")) }
  end
end

Results on my laptop with the parameters listed above:

                                      user     system      total        real
collect { |s| s.lstrip.rstrip }   7.970000   0.050000   8.020000 (  8.246598)
collect { |s| s.strip }           6.350000   0.050000   6.400000 (  6.837892)
collect { |s| s.strip! }          5.110000   0.020000   5.130000 (  5.148050)
collect(&:strip!)                 5.700000   0.030000   5.730000 (  6.010845)
split(/\s*#{delim}\s*/) (static)  6.890000   0.030000   6.920000 (  7.071058)
split(/\s*#{delim}\s*/) (dynamic) 6.900000   0.020000   6.920000 (  6.983142)

From the above I might conclude:

Using strip instead of .lstrip.rstrip is faster.
Preferring &:strip! over { |s| s.strip! } comes with a performance cost.
Simple regex patterns are nearly as fast as using split + strip.

Things that I can think of that may influence the result:

The length of the delimiter (and whether or not it is whitespace).
The length of the strings that you want to split.
The length of the splittable chunks in the string.

But don't take my word for it. Measure it!

molf 2010-03-02 22:06:13

+1 for good example

Aurril 2010-03-02 22:27:56

This was the best answer I received. Backed with facts. And of course, I learned some benchmarking too! Thanks.

ramonrails 2010-03-08 17:22:42

ansaurus

tags:

views:

answers:

ruby: optimize => phrase.split(delimiter).collect {|p| p.lstrip.rstrip }

related questions